TSELab / guac-alytics

A series of tools and resources to better understand the risk profile of open source software ecosystems
Apache License 2.0
2 stars 0 forks source link

Handle popcon data #2

Closed SantiagoTorres closed 11 months ago

SantiagoTorres commented 1 year ago

We require to handle the popcon data dump. Similarly to #1, we want to handle this file type under a script/library under scripts/ingestion. The expected schema should follow the popcon tabular description:

#rank name inst vote old recent no-files (maintainer)

(see the rest of the format description):

#Format
#
#<name> is the package name;
#<inst> is the number of people who installed this package;
#<vote> is the number of people who use this package regularly;
#<old> is the number of people who installed, but don't use this package
#      regularly;
#<recent> is the number of people who upgraded this package recently;
#<no-files> is the number of people whose entry didn't contain enough
#           information (atime and ctime were 0).
SahithiKasim commented 1 year ago

I have built the popularity table (popcon) with this schema:

_name inst vote old recent no-files maintainer inst_norm votenorm

You can find it - _data/yellow/vineet/database/bi_multi_tables.db/popularity_table_ in this table

SantiagoTorres commented 1 year ago

I would like to see the script, rather than the table

sbrunswi commented 11 months ago

@SantiagoTorres - I think this is complete! what was the commit for this?

SahithiKasim commented 11 months ago

Check the updated code from #4!