Closed Khalimat closed 1 year ago
Could you send the specific file name, or put it somewhere where I can get it from you? It looks like there’s a lot of data on that site.
Hi Nick,
Thank you so much!
Yup, I meant the file which contains all protein sequences:
curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/IMG_VR/download/_JAMO/63a22c8a3b5d0133c73fb0a2/IMGVR_all_proteins-high_confidence.faa.gz' -b cookies > IMGVR_all_proteins-high_confidence.faa.gz
Sorry, seems it was a mistake on my HPC side - I submitted just the same job and it did work.
Likely a disk space limitation for tmp files. Check that you have ${TMPDIR} pointing somewhere that has a bunch of space, enough for building the index. For large databases, when the index is too large to sort in RAM, it switches to using a tmpfile and doing an on-disk sort.
Thank you! That is useful to know!
Hi Sean,
Thank you for the library!
I am getting a very cryptic message when I try to index a protein fasta file from IMG_VR v.4
One assumption I had was that there were some repeating keys, but I checked and all of them are unique, so I would be very grateful for any advice on way to sort this out.
I am using Rocky Linux.