chains-project / sbom-files

Long term storage of software bills of materials (sbom) https://arxiv.org/pdf/2303.11102.pdf
6 stars 1 forks source link

Update dataset with links to corresponding GitHub repo branch/tag #3

Closed javierron closed 1 year ago

javierron commented 1 year ago

Add another column in the dataset file with a direct link to the source code of each item. e.g. for jenkins core -> repo

@algomaster99 Maybe we have these already?

algomaster99 commented 1 year ago

Hi @javierron ! I have done that at https://github.com/chains-project/sbom-files/pull/2. I linked it to commit directly not the tag. I will merge it right away.

javierron commented 1 year ago

Thanks @algomaster99 ! What's with the missing commit on for spoon-core? I could not find it either.

@MartinWitt Do you know if this is available somewhere?

MartinWitt commented 1 year ago

Spoon beta commit hashes do not exist. This is because we change the version in the pom and create the release. The change in the version from, e.g., 10.3.0 to 10.3.0-BETA-13 is never pushed to GitHub and local only. Therefore, the hash does not exist.

javierron commented 1 year ago

Thanks @MartinWitt I understand. Can we use another commit as a proxy for computing statistics of the repo at that point in time (lines of code, # of classes, etc)?

algomaster99 commented 1 year ago

@javierron, @MartinWitt suggested to use the stable releases on spoon (find that on GitHub releases) spoon-core. Not the beta versions.

algomaster99 commented 1 year ago

@javierron please notice the updates in sbom-dataset.md. There were some changes today.

javierron commented 1 year ago

@algomaster99 @MartinWitt

WDYT about updating the commit column to point to the state of the repo in that commit, and also to the corresponding subdir of the module, e.g. jenkins-core -> commit

algomaster99 commented 1 year ago

We also discussed not running tools on submodules because they usually cause dependency resolution problems. So for any project, consider it from the root of the project or the higher-most pom. For example, jenkins-core and jenkins-cli will become just one project in that case.

commit column to point to the state of the repo

Most of the commits were recorded via a script, but some of them were manual so I don't want to change it. What you can try on your script is to get the commit and then use URL: https://github.com/jenkinsci/jenkins/tree/<commit-hash> to go to the state of the repo.