Order of data processing tasks (please document this)

rykahsay commented 1 week ago

These are the ordered list of dataset creation steps:

Making protein datasets part-I (creation of protein datasets that do not depend on proteoform datasets)
Making glycan datasets
Making proteoform datasets
Make protein datasets part-II (creation of protein datasets that depend on proteoform datasets)
Update medline, pubchem, pdb, alphafold data for the new in the release)
Update citation datasets

How to tell which protein datasets are created in part-II

$ cd /software/glygen/
$ python wrap-check-datasets-all.py -m protein > logs/protein.log
$ cat  logs/protein.log | grep failed | grep DEPENDENT-proteoform-ds

jeet-vora commented 3 days ago

@ReneRanzinger The above workflow handled by Robel needs to be added to release schedule accordingly. We can talk more when you are closer to making the data release schedule.

ReneRanzinger commented 3 days ago

Any task that needs to be added needs to go to the "Milestone" document in the "Tasks planning" > "GlyGen - version 2.7" Sharepoint folder. When I start next release planning this document is copied into the new folder. The file is currently maintained by Urnisha and Kate. "Development" and "Testing" sheet have the main tasks for the two phases.

Any tasks you want to add as recurring task for each release should be in this sheets.

glygener / glygen-issues

Order of data processing tasks (please document this) #1814