Closed mmcardozo closed 3 years ago
Hello Magda, how big is the PR2 database? One possibility is that the database is simply too big and should be deduplicated or clustered, although for euk sequences this is unlikely.
For comparison, the SILVA SSU Ref NR 99 database has about 500 k entries: https://www.arb-silva.de/documentation/release-1381/
Could you please attach the log file tmp.bbmask_mask_repeats.log
, if it is not empty?
Hi, The PR2 data base is 298M tmp.bbmask_mask_repeats.log.txt
looks like there was a duplicate entry in the database, if you look in the log file that you attached. could you remove such duplicates and try again?
Hi, Yes I had several repeated entries on the fasta file. It worked and took a lot less time. Many thanks! Magda
Thanks for letting us know! Could you please close this issue? If a related problem comes up you can always open it again.
Hi all, I wanted to run phyloflash with PR2 database instead of SILVA, i believe this is possible to do. I modified the file accordingly and tried to create the database as described in the instructions:
phyloFlash_makedb.pl --univec_file /home/ollie/mcardozo/databases/UniVec -overwrite -log makedb.log --silva_file /home/ollie/mcardozo/databases/SILVA_414_pr2_version.fasta
it seems to work but it has taken over 3+ days and the job gets killed due to time limit. Perhaps there is something I did wrong? is there a way to make this run faster?
here is the log file: db.txt
Many thanks in advance, Magda