NCATSTranslator / Knowledge_Graph_Exchange_Registry

The Biomedical Data Translator Consortium site for development of Knowledge Graph Exchange Standards and Registry
MIT License
5 stars 3 forks source link

.tsv upload is truncating files #50

Closed jeffhhk closed 3 years ago

jeffhhk commented 3 years ago

I uploaded two files were uploaded into yeast-sri-reference-kg-tsv version 1.2. These were line counts in the uploaded files:

13759207 edges.tsv
 2816795 nodes.tsv
16576002 total

But when I download the archive from KGE, the line counts are lower:

13710877 edges.tsv
 2749276 nodes.tsv
16460153 total

I've uploaded only two complete file sets, and the sizes were truncated both times, so it appears to be easy to reproduce.

The last line of the downloaded file usually appears truncated.

jeffhhk commented 3 years ago

This issue looks resolved to me. Here is my comparison:

  $ cat ~/kge-issue-50/uploaded/yeast-sri-reference/0.3.0b/tsv/nodes.tsv | nl -ba | tail -3
  2816793   ZFIN:ZDB-SNORNAG-120314-7   snord31.3   biolink:NamedThing|biolink:GenomicEntity|biolink:Gene   small nucleolar RNA, C/D box 31.3       zfin                                        owl:Class   
  2816794   ZFIN:ZDB-SNORNAG-150916-2   snord7  biolink:NamedThing|biolink:GenomicEntity|biolink:Gene   small nucleolar RNA, C/D box 7      zfin        owl:Class   
  2816795   ZFIN:ZDB-SNORNAG-200824-1   snord69 biolink:NamedThing|biolink:GenomicEntity|biolink:Gene   small nucleolar RNA, C/D box 69     zfin        owl:Class   
  $ cat nodes.tsv  | nl -ba | tail -3
  2816794   ZFIN:ZDB-SNORNAG-150916-2   snord7  biolink:NamedThing|biolink:GenomicEntity|biolink:Gene   small nucleolar RNA, C/D box 7      zfin        owl:Class   
  2816795   ZFIN:ZDB-SNORNAG-200824-1   snord69 biolink:NamedThing|biolink:GenomicEntity|biolink:Gene   small nucleolar RNA, C/D box 69     zfin        owl:Class   
  2816796   
  $ ls -l ~/kge-issue-50/uploaded/yeast-sri-reference/0.3.0b/tsv/nodes.tsv nodes.tsv 
  -rw-rw-r-- 1 jeff jeff 550035140 Jul 22 14:57 /home/jeff/kge-issue-50/uploaded/yeast-sri-reference/0.3.0b/tsv/nodes.tsv
  -rw-r--r-- 1 jeff jeff 550035141 Sep 10 10:09 nodes.tsv

You added an end of line to the file. I probably would have tried to avoid that, but it seems fine so long as the format is always text-based (TSV).