Closed acevedol closed 2 years ago
Hi Lili, this plan (splitting edges.tsv
) sounds like a practical option. Thank you. I assume this applies just to KG2pre, right? i.e., I assume KG2c still fits?
yes, this is just for KG2pre, and just for the edges. I tried this out with a script split_kgx_edges_tsv.py
, and it gave me the files needed to compress enough for upload. I'm not sure if this needs to be part of the kgx tsv build process, but I'm going to upload the file for future use.
One of the files generated for upload to the Knowledge Graph Exchange,
edges.tsv
, is much too large to upload to Git LFS.Edges.tsv
comes out at 34GB+, and the best compression ratio I can get using xz -9 is about 13.9. This produces a file that is about 4.7GB, still too large to push to Git LFS which has a max file size of 4GB.I propose splitting
edges.tsv
into two or more tsv files, then compressing these. The resulting compressededges1.tsv.xz
andedges2.tsv.xz
should be completely capable of upload.@saramsey Do you see any potential issues?