commercetest / nlnet

Analysis of the opensource codebases of NLnet sponsored projects.
MIT License
0 stars 0 forks source link

Generate RDF files with project info #5

Closed julianharty closed 7 months ago

julianharty commented 8 months ago

Context

We'd like to be able to save the project-related data we obtain so it can be combined by us and by others with the parent data of the NLnet projects. NLnet uses RDF data formats for the parent data and can accept and combine related RDF data if we provide it.

Task

Once the RDF structures have been defined (which will probably be on https://codeberg.org/NLnet/importscripts), generate query results exported using these structures.

julianharty commented 8 months ago

RDF file formats include https://jena.apache.org/documentation/io/

julianharty commented 7 months ago

Some useful sources of information about processing RDF include:

tnzmnjm commented 7 months ago
tnzmnjm commented 7 months ago

-----repourl: https://github.com/siacs/Conversations --> when I opened it: I see the github url is now : https://github.com/iNPUTmice/Conversations --> in the README section I see they have moved the code to : A New Home We've moved. Conversations is now developed on Codeberg. moved to codeberg : https://github.com/librestack/librecast

----- some repos do not have any test files : https://github.com/Ayms/node-Tor , https://github.com/pylls/padding-machines-for-tor, https://github.com/arpa2/draft-vanrein-tls-kdh, https://github.com/NLnetLabs/dnssec-ceremony-doc ,https://github.com/sensifai/Sensifai-NPU-SDK , https://github.com/blueprint-freespeech/refresh-site, https://github.com/arpa2/draft-vanrein-httpauth-sasl, https://github.com/NLnetLabs/connectbyname

------ in some repos when I searched for the word test realised it's mentioned in an issue as a text not a test file so It's fine to get 0 in the script : https://github.com/NLnetLabs/dnssec-ceremony-tools , https://github.com/simmel-project/hardware, https://github.com/MEGA65/megaphone-r4-pcb, https://github.com/stef/zphinx-zerver, https://github.com/beeldengeluid/peertube-plugin-creative-commons, https://github.com/beeldengeluid/extending-peertube, https://github.com/jobisoft/TbSync, https://github.com/otrv4/otrv4, https://github.com/rust-threadpool/rust-threadpool, https://github.com/FOSDEM/video-hardware

----- Some repourls point to an issue but not the owner+repo like: https://github.com/osresearch/heads/issues/540

Need to address :

  1. https://github.com/tdf/odftoolkit.git & https://github.com/eduvpn/apple& https://github.com/stratosphereips/AIVPN --> test count =0 (test is mentioned in issues but not a file so it's correct) -> the latest hash is nan. Not sure why
  2. this points to an issues not a test file - https://github.com/osresearch/heads/issues/540
  3. https://github.com/seedvault-app/seedvault has loads of test files not sure why the testcount is 0 --> the test files end with .kt - for instance App/src/test/java/com/stevesoltys/seedvault/crypto/CryptoTest.kt --> cannot see this in the cloned repos on the hard disk either (after the second run, I can see the repo) ---- same with this repo https://github.com/jitsi/jitsi-meet & https://github.com/newaetech/chipwhisperer ( I can see test files )
tnzmnjm commented 7 months ago

Investigation:

    • https://github.com/tdf/odftoolkit.git--> This repo is abour 600MB, has 18 contributors, 18 branches and I can see loads of test files on the web. I ran the list_test_files only on his repo and it found 9667 items.
    • https://github.com/stratosphereips/AIVPN ---> the script returned 0 after running on this repo again. Which is correct as I checked the web and cannot see any filenames/path with the word test but there's one file which seems to test something docs/build/_static/language_data.js
  1. got the list of the repourls where the testfilecountlocal = 0 and the last_commit_hash is not nan: 50 https://github.com/osresearch/heads/issues/540 ---> This points to an issue not an owner+repo 132 https://github.com/eez-open/modular-psu ---> This is correct. Checked on the web 247 https://github.com/ernestwisniewski/kbin ---> repo is cloned (19 MB) - ran the function and result is 0 but can see test files on the web 257https://github.com/armijnhemel/binaryanalysis-...---> this doesn't point to an owner + repo 264https://github.com/organicmaps/organicmaps---> this hasn't been cloned 270https://github.com/chromi/sce ---> This hasn't been cloned 271https://github.com/Wakoma/nimble/tree/smart_doc---> this doesn't point to an owner + repo 288https://github.com/overte-org/overte` ---> This hasn't been cloned

    • https://github.com/seedvault-app/seedvault ---> found 2 projects pointing at the same repo . Their nlnet pages are different. (https://github.com/seedvault-app/seedvault, https://nlnet.nl/project/Seedvault/,https://nlnet.nl/project/SeedVault-Integrity/) has loads of test files on the web. The cloned repo is about 300 MB. Running the test count function again ---> found 94 test files.
    • https://github.com/jitsi/jitsi-meet ---> cloned repo is 416 MB - ran the count function again ---> found 13 test files.
    • https://github.com/newaetech/chipwhisperer --> cloned repo is 1.1 GB - running the count function again ---> found 224 test files