Output files - Githubissues

HRGV / phyloFlash

phyloFlash - A pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.

GNU General Public License v3.0

77 stars 25 forks source link

Output files #155

Closed shunan29 closed 2 years ago

shunan29 commented 2 years ago

Hi, I am new to this and have some questions regarding the output files of phyloFlash. I am using phyloFlash ver 3.3, and am trying to construct the 16S sequences from my metagenome samples. However, I am having difficulty interpreting the output files of phyloFlash. My command is as below:

phyloFlash.pl -lib TEST -read1 test_R1.fastq -read2 test_R2.fastq -almosteverything -log -taxlevel 5 -CPUs 16 -readlength 150 -dbhome /path/to/db/138.1/

However, I do not see any of the described output files other than NTUabundance.csv

I wonder if there's something wrong with my command that's causing me to not have all the files? I have tried -almost everything and -everything but none of them gave me my desired files, which are mainly the assembled/reconstructed sequence files.

Any help would be appreciated, thank you!

kbseah commented 2 years ago

Hello, could you please attach the output of the log file here? It should be named something like TEST.phyloFlash.log with the command line you've given. Do you see any other messages appear on screen? Is the NTUabundance.csv file empty, or does it report something close to what you expected to find in your libary?

shunan29 commented 2 years ago

Hi, this is the log generated from -everything . The messages that appeared on the screen are the same as those in the log file. The NTUabundance.csv was not empty but I would have to run it on a couple more samples to see if it seems right or not.

kbseah commented 2 years ago

Thanks for sending this. It looks like very few reads were mapped (24 read pairs), probably because the library is relatively small (~141k read segments), so SPAdes and Emirge were unable to assemble anything.

Could you please give the exact command that you used for this library, as well as the output of the command ls -ahl for the folder where you ran the job? Were there other files aside from the NTU abundance CSV table and the phyloFlash log?

shunan29 commented 2 years ago

Hi, thanks for the reply. I guess that is expected due to the nature of the sampling sites.

kbseah commented 2 years ago

Thanks, it looks like the other files were not created because the assembly did not work (not enough input sequence). I misunderstood your initial post to mean that only the NTU abundance file was created.

The log file and phyloFlash HTML report file were successfully created so the run seems to have finished properly. The HTML report file gives a graphical summary which may be helpful.

Unfortunately there's not much that can be done without higher sequencing depth. If you have individual samples representing a time series or multiple samples from the same locality, you could consider pooling them together in a single phyloFlash run just to see if you can get enough coverage to assemble some full length SSU sequences.

shunan29 commented 2 years ago

Thanks for the answer! I'll try on some more samples to see if they're any different. Just curious, if there's not enough input sequences to assemble full length 16S sequence, how was the OTU table generated? Are they from partial sequences?

shunan29 commented 2 years ago

Hi, I tried it out with my sample with the most sequences (~6.7M read segments) and it mapped 2433 read pairs. I wonder if this is still insufficient as I still don't have any output sequence files? Thanks!

kbseah commented 2 years ago

The NTU table summarizes the taxonomy of reference sequences that the reads are mapped to.

Regarding the sample with 2433 read pairs: one possibility is that the sequences are diverse, so no individual sequence has enough coverage to be assembled properly.

shunan29 commented 2 years ago

Got it, thanks!