glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Script for detecting changes in the predicates of ebi nt files #1770

Closed katewarner closed 2 months ago

katewarner commented 2 months ago

As discussed during the developer meeting, please make a script which will:

  1. Compare the predicates in the individual ebi NT files (e.g. uniprot-proteome-arabidopsis-thaliana.nt) between the current and previous release, and then
  2. Output any differences between the releases to help us detect large data changes.

We will then document the script in Backend workflow doc

rykahsay commented 2 months ago

Follow the following commands:

$ cd /software/glygen
$ nohup python3 dump-ebi-stats.py  -o 2024_06_20 -n 2024_09_19  &

when the above command finishes, there will be output files logs/uniprot-proteome-*.nt.stat.log and you can use the following command to see top change in dataset type changes

$ cat logs/uniprot-proteome-saccharomyces-cerevisiae.nt.stat.log |sort -nr |head

11998,decreased,88446,100444,<http://biohackathon.org/resource/faldo#Region>
10576,decreased,6328,16904,<http://biohackathon.org/resource/faldo#ExactPosition>
2946,decreased,206232,209178,<http://purl.uniprot.org/core/Resource>
225,increased,8306,8081,<https://sparql.glygen.org/ontology/Xref_Identifier>
208,increased,23440,23232,<http://purl.uniprot.org/core/Helix_Annotation>
184,decreased,5796,5980,<http://www.w3.org/2002/07/owl#Class>
69,increased,7447,7378,<http://purl.uniprot.org/core/Binding_Site_Annotation>
katewarner commented 2 months ago

Works great! Many thanks