TruX-DTF / findbugs-violations

14 stars 10 forks source link

from where and how to start? #28

Open kimgimkigi opened 2 years ago

kimgimkigi commented 2 years ago

Hi, Thanks for sharing your items.

I'm trying to use your tool with other static analysis tools. Before using other tools, I wanna execute your tool under your environments and datasets.

I tried to "git clone {project-name}" all of the repositories in "repos.list" and I successfully got 684 projects in 730 projects you mentioned in the paper. (46 projects are cannot find their repository from remote-server)

Then I guess the start points of your tools are "repo-iterator". So I tried to execute "archive.sh" first.

In "archive.sh" (AchiveCommand.java and GitCommands.scala), There are 3 arguments(gitDir, outputDir, hash) are needed.

image

I guess gitDir is the path of each project. But I cannot expect what is hash. I briefly expect that hash means hash ID of project commits but I cannot be sure it is correct and If it is, then how many commit hashes will be used.

Could you please explain what is hash and the expected value of argument hash?

Kui-Liu commented 2 years ago

@kimgimkigi Dr. Dongsun Kim darkrsw@gmail.com maintains the data collection, I just forward it to him.

darkrsw commented 2 years ago

@kimgimkigi "hash" is a commit hash. "AchiveCommand" class is for running a "git archive" command. It takes a commit hash to "archive", which copies a vanilla version of the target project's source code without any git-related files. You can copy files of a specific snapshot (i.e., commit hash) of the target project (specified by "gitDir") into a temporary directory ("outputDir") one by one.

Please let me know if you have any further questions.

kimgimkigi commented 2 years ago

@darkrsw Thanks for your reply. But how can I specify the vanilla version of the target project? If I use first commit ID (82fb5921) of "acceptance-test-harness" project, I only found README.markdown in archived zip file. How did you get vanilla commit ID of each project? Or, do you still have the saved commit ID of each project?

Thanks!

darkrsw commented 2 years ago

@kimgimkigi @kimgimkigi

@darkrsw Thanks for your reply. But how can I specify the vanilla version of the target project?

vanilla version is not something special. If you use git archive xxxxxx command, it produces a .zip file that contains pure project files without any git-related files.

If I use first commit ID (82fb5921) of "acceptance-test-harness" project, I only found README.markdown in archived zip file.

This is a correct behavior. Since the commit 82fb5921 has only one single file, the zip file of course contains the single file.

How did you get vanilla commit ID of each project?

Again, a vanilla version is not something special. It is just a single snapshot of a project specified by a commit hash. vanilla stands for pure project files.

Or, do you still have the saved commit ID of each project?

We have run our tool over all the commits in a project. Thus, no need to save any commit ID after our experiment. Literally, we archived all available versions of a project and applied FindBugs to all of them.

kimgimkigi commented 2 years ago

@darkrsw

Hi.

Now I'm trying to execute UnfixedAlarmCollector.scala.

My target project is "acceptance-test-harness", one of your dataset projects. Before I start, I set my neo4j infos in VioDBFacade.scala.

When I start UnfixedAlarmCollector.scala, it seems like there is no problem on GitProxy and neo4j, but I got nothing from UnfixedAlarmCollector.scala script.

I checked perProject.size and it returns value 21. But I've got those console output.

Neo4J query completed: 0

Node after filtering: 0

Original Neo4J query completed: 0

Node after filtering: 0

I thought this is related to the below neo4j commands. UnfixedAlarmCollector.scala : Line 58~

val results = VioDBFacade.session.run(
   s"""match (n:Violation {project: '$project'}) 
                  where NOT (n)-[:CHILD]->(:Violation)
                  return n""")

Do I have to do something on neo4j or elsewhere before running this script?

Thanks!

darkrsw commented 2 years ago

@kimgimkigi

Now I'm trying to execute UnfixedAlarmCollector.scala.

Basically, the script assumes that alarm data are already inserted to the DB.

My target project is "acceptance-test-harness", one of your dataset projects. Before I start, I set my neo4j infos in VioDBFacade.scala.

Again, setting up the DB is not enough. You need to put data upfront.

When I start UnfixedAlarmCollector.scala, it seems like there is no problem on GitProxy and neo4j, but I got nothing from UnfixedAlarmCollector.scala script.

I checked perProject.size and it returns value 21. But I've got those console output.

This depends on the summary file instead of the DB. Thus, its result is not 0.

Neo4J query completed: 0

Node after filtering: 0

Original Neo4J query completed: 0

Node after filtering: 0

I thought this is related to the below neo4j commands. UnfixedAlarmCollector.scala : Line 58~

val results = VioDBFacade.session.run(
   s"""match (n:Violation {project: '$project'}) 
                where NOT (n)-[:CHILD]->(:Violation)
                return n""")

Do I have to do something on neo4j or elsewhere before running this script?

To get some results from the query, you need to run another script collecting alarms.

lxyeah commented 2 years ago

Hi! @kimgimkigi

I found that you have configured this project when I browsed this issue, and I'm trying to configure this project too. But when I used maven to pack the subproject "parsing-utils", I found that a dependency couldn't be found.

`

org.eclipse.jface org.eclipse.jface.text 3.3.0

` Have you ever encountered this problem? Could you teach me how to resolve this problem please ?

kimgimkigi commented 2 years ago

Hi @lxyeah

I also struggled with that bug. I solved it by using the below jface version.

https://mvnrepository.com/artifact/org.eclipse.jface/text/3.3.0-v20070606-0010 image

I roughly set up findbugs-violation but finally, I couldn't reuse it because of my less understanding of environments. Cheer up.

lxyeah commented 2 years ago

Hi @kimgimkigi Your help was very much appreciated, and now I can build all the subproject. I also have configured the neo4j database. I scanned the previous conversation —— “setting up the DB is not enough. You need to put data upfront.” So do you know what data should I insert to the database and what format is the data? And could you tell me that what else job I need to do to run the UnfixedAlarmCollector.scala?

Thanks a lot!

lxyeah commented 2 years ago

Hi! @darkrsw Could you provide us with a detailed process doc of running the whole project? Like before start we should do something..., what data we should insert to database, what script we should run, and program execution sequence... I'm very sorry to disturb your normal working hours, I sincerely hope you could give us some help. Looking forwad to your reply.

kimgimkigi commented 2 years ago

@lxyeah

So do you know what data should I insert to the database and what format is the data?

I'm not familiar with neo4j database, so I failed to set up from that point. I'm sorry for not being of any helpful.

lxyeah commented 2 years ago

@kimgimkigi Anyway, thank you, my bro! if you make any progress, please teach me! :-)

kimgimkigi commented 2 years ago

@darkrsw

Hi.

I have simple question. What is the data's label of CNN feature extraction for each Code pattern and Fix pattern? I cannot build your training data so It's hard to find it. Thanks!

darkrsw commented 2 years ago

@kimgimkigi

I don't get your question. The CNN does not extract any specific feature but it embeds each patch, not the pattern.

@Kui-Liu Can you explain how to build the training data for the CNN model for @kimgimkigi ?

kimgimkigi commented 2 years ago

@darkrsw

As I understand, for single pattern, word2vec token vectors of pattern are fed into CNN model, to get learned discriminating feature vectors. then, those feature vectors are used in X-means clustering to figure out cluster of code and fix patterns.

I understand what is the input data of CNN model (Word2vec token vectors). Then my question is , what is the each label of those input data in training process.

Thanks.

darkrsw commented 2 years ago

@kimgimkigi

No. For a single patch, word2vec token vectors of pattern are fed into CNN model to get learned discriminating feature vectors. X-means clustering is used for identifying fix patterns from several common similar patches.

I understand what is the input data of CNN model (Word2vec token vectors). Then my question is , what is the each label of those input data in training process.

There is no label information for those input data. Basically, the CNN model is for autoencoding and no label information is necessary (i.e., input vector === output vector).

lxyeah commented 2 years ago

Hi! @kimgimkigi I have generated the vectorizedTokens.csv and I try to run PatternMiner.java, I encountered a problem —— the program runs to CNNFeatureLearner.java, I found that I need to get the feature from the outputLayer by the source code as follows: INDArray input = model.getOutputLayer().input(); features.append(input.toString().replace("[[", "").replaceAll("\\],", "") .replaceAll(" \\[", "").replace("]]", "") + "\n"); And I found that the input is null. Have you ever had this problem? Could you teach me how to solve this problem or give me some advice?