Question regarding gathering FASTA reads from taxid step

GiantSpaceRobot / FindFungi

A pipeline for the identification of fungi in public metagenomics datasets

16 stars 15 forks source link

Question regarding gathering FASTA reads from taxid step #5

Closed mhyleung closed 5 years ago

mhyleung commented 5 years ago

Hi Paul

So I have managed to get this to start running until the part "### Gather FASTA reads for each taxid predicted." It appears that to get the fasta reads a bsub command is necessary. I have removed bsub (I am running on a single cluster), but it still needs

-e [dir]/bsub_reports/ReadNames-to-FASTA.$Taxid.stderr

I am not entire sure when the files bsub_reports/ReadNames-to-FASTA.$Taxid.stderr are generated if I remove all the bsubs.

Thanks

Marc

GiantSpaceRobot commented 5 years ago

Hi Marc,

This line is tricky when converting from bsub to normal unix systems. The file you are referencing is a standard error file, and it is not necessary for the successful completion of the step. Please use this line of code instead:

awk -v reads="$Dir/Processing/ReadNames.$Taxid.txt" -F "\t" 'BEGIN{while((getline k < reads)>0)i[k]=1}{gsub("^>","",$0); if(i[$1]){print ">"$1"\n"$2}}' $Dir/Processing/Reads-From-Kraken-Output.$z.Reformatted.fsa > $Dir/Processing/ReadNames_bsub.$Taxid.fsa

Let me know how that goes.

Paul

mhyleung commented 5 years ago

Hi Paul

Thanks for this. I managed to get this to work, but now I encounter an issue that has been raised previously https://github.com/GiantSpaceRobot/FindFungi/issues/1 .

I have made a new post there. You may want to close this thread. Thanks

Regards

Marc