can't find download locations of uniprot indexes

colin-heberling commented 3 years ago

Where do the uniprot indexes download to when using paladin prepare? I downloaded swiss-prot just fine, but when I try to download uniref it says it failed to write to disk, so I'm guessing it ran out of disk space wherever it was trying to download to. Can I change the default download location, and then specify that location in some kind of config file so that paladin knows where to find the indexed databases?

rmondav commented 3 years ago

because the authors don't seem to be answering I thought I'd see if I could help...unfortunately I couldn't find where location is specified in the code but it will either download to the location you ran the script from or the same directory that paladin is installed in (if they are different). It sounds like you already know the problem is insufficient space. Just so you know.... uniref90 is currently 32G, and the whole indexed and prepared uniref90 database takes up close to 400G. You will need approximately 48 hours and 800G RAM to do the prepare/indexing

ToniWestbrook commented 3 years ago

Hi @colin-heberling and @rmondav - again, apologies for the delay, I recently took a new job at UNH and things have been crazy hectic. When making use of the "prepare" command, the reference will be downloaded to your current working directory. Using the "prepare" command also automatically runs the "index" command, and all index relate files (BWT, SA, etc) are created in the same directory as the reference fasta file (which will be your current working directory if you use the prepare command). So be sure your CWD is whatever directory has enough space to accommodate both the reference and all the index files.

Also, I suggested this in another thread, but if you want a more complete reference than SwissProt (like the UniRef90), but you're unable to index the UniRef90 because it's too large (because of either memory or storage restrictions), you can also filter out entries you know you won't be mapping to, like HUMAN or other eukaryotes or whatever that have a lot of entries but won't be a (valid) target. Obviously there's some cases where you need everything, but this is especially useful for microbial metagenomics. Let me know if you run into any other issues.

colin-heberling commented 3 years ago

Hi @ToniWestbrook, it looks like when you try to run using the docker image that the prepare command downloads the reference and creates the index files in the docker installation location instead of the current working directory. This could be problematic, as I'm trying to use Amazon EC2 to get the computational requirements to build the index, but I was having some trouble installing the traditional way. Do you think this is something you could fix with the docker build?

ToniWestbrook / paladin

can't find download locations of uniprot indexes #45