LangilleLab / microbiome_helper

A repository of bioinformatic scripts, SOPs, and tutorials for analyzing microbiome data.
GNU General Public License v3.0
430 stars 205 forks source link

not a real issue but a help request #4

Closed josemseoane closed 7 years ago

josemseoane commented 7 years ago

Hello, Firstable, let me thank you for the great job you did here!

I am running the 16S tutorial and I am stuck on filtering reads by quality and length. When I run

read_filter.pl -q 80 -p 20 -l 270 -thread 4 -c both stitched_reads/.assembledfastq

everything goes smoothly but all the filtered fastaq files produced within my filtered_reads folder are empty. I have tried different parameters for -q -p-I but I always get the same. My read_filter log looks like if i am filtering everything: file initial qFiltered lFiltered forwardFiltered reverseFiltered final qFilteredPercent lFilteredPercent forwardFilteredPercent reverseFilteredPercent finalPercent 0162532-8513-A_S134_L001.assembled.fastq 177157 177157 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162533-8514-A_S135_L001.assembled.fastq 118053 118053 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162534-8515-A_S136_L001.assembled.fastq 171294 171294 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162535-8516-A_S137_L001.assembled.fastq 120859 120859 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162536-8517-A_S138_L001.assembled.fastq 143947 143947 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162537-8518-A_S139_L001.assembled.fastq 150560 150560 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162538-8519-A_S140_L001.assembled.fastq 228830 228830 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162539-8520-A_S141_L001.assembled.fastq 172526 172526 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162540-8521-A_S142_L001.assembled.fastq 202045 202045 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162541-8522-A_S143_L001.assembled.fastq 150209 150209 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152646A-RPCM0001-515fB_S47_L001.assembled.fastq 27934 27934 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152647A-RPCM0009-515fB_S48_L001.assembled.fastq 8814 8814 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152648A-RPCM0018-515fB_S49_L001.assembled.fastq 14862 14862 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152649A-RPCM0038-515fB_S50_L001.assembled.fastq 7433 7433 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152650A-RPCM0041-515fB_S51_L001.assembled.fastq 13096 13096 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152651A-RPCM0050-515fB_S52_L001.assembled.fastq 21090 21090 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152653A-RPCM0057-515fB_S54_L001.assembled.fastq 21564 21564 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152654A-RPCM0093-515fB_S55_L001.assembled.fastq 6502 6502 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152655A-RPCM0129-515fB_S56_L001.assembled.fastq 23722 23722 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152658A-RPCM0157-515fB_S59_L001.assembled.fastq 4486 4486 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152659A-RPCM0161-515fB_S60_L001.assembled.fastq 6640 6640 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152660A-RPCM0177-515fB_S61_L001.assembled.fastq 1285 1285 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152661A-RPCM0194-515fB_S62_L001.assembled.fastq 8754 8754 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152662A-RPCM0209-515fB_S63_L001.assembled.fastq 16174 16174 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152663A-RPCM0217-515fB_S64_L001.assembled.fastq 2504 2504 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152664A-RPCM0225-515fB_S65_L001.assembled.fastq 2806 2806 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152665A-RPCM0242-515fB_S66_L001.assembled.fastq 1398 1398 0 0 0 0 100.0 0.0 0.0 0.0 0.0 9152652A-RPCM0053-515fB_S53_L001.assembled.fastq 14960 14960 0 0 0 0 100.0 0.0 0.0 0.0 0.0 9152656A-RPCM0138-515fB_S57_L001.assembled.fastq 33840 33840 0 0 0 0 100.0 0.0 0.0 0.0 0.0 9152657A-RPCM0145-515fB_S58_L001.assembled.fastq 33899 33899 0 0 0 0 100.0 0.0 0.0 0.0 0.0

I attach my FastQC report for all my samples combined:

image

How would you correctly set up the filtering for a set of samples which looks like mine?

Thank you very much in advance for your help and for making this tools available for those not very skilled on programming.

Best regards,

Jose

gavinmdouglas commented 7 years ago

Hey Jose,

It looks like your PE reads did not overlap very well in the middle of your stitched reads, which is why you see that extreme dip in quality. Also, I think maybe you’re confusing the “-q” and “-p” options (you can read about FASTQ format here: https://en.wikipedia.org/wiki/FASTQ_format https://en.wikipedia.org/wiki/FASTQ_format and type "read_filter.pl —help” for the script documentation).

The max quality score is 41 and so 0% of your reads will have a cut-off of 80. A quality cut-off of ~30 for >80% of sites would be reasonable, but there is no one right answer.

Cheers,

Gavin

On Feb 21, 2017, at 12:47 PM, josemseoane notifications@github.com wrote:

Hello, Firstable, let me thank you for the great job you did here!

I am running the 16S tutorial and I am stuck on filtering reads by quality and length. When I run

read_filter.pl -q 80 -p 20 -l 270 -thread 4 -c both stitched_reads/.assembledfastq

everything goes smoothly but all the filtered fastaq files produced within my filtered_reads folder are empty. I have tried different parameters for -q -p-I but I always get the same. My read_filter log looks like if i am filtering everything: file initial qFiltered lFiltered forwardFiltered reverseFiltered final qFilteredPercent lFilteredPercent forwardFilteredPercent reverseFilteredPercent finalPercent 0162532-8513-A_S134_L001.assembled.fastq 177157 177157 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162533-8514-A_S135_L001.assembled.fastq 118053 118053 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162534-8515-A_S136_L001.assembled.fastq 171294 171294 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162535-8516-A_S137_L001.assembled.fastq 120859 120859 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162536-8517-A_S138_L001.assembled.fastq 143947 143947 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162537-8518-A_S139_L001.assembled.fastq 150560 150560 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162538-8519-A_S140_L001.assembled.fastq 228830 228830 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162539-8520-A_S141_L001.assembled.fastq 172526 172526 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162540-8521-A_S142_L001.assembled.fastq 202045 202045 0 0 0 0 100.0 0.0 0.0 0.0 0.0 0162541-8522-A_S143_L001.assembled.fastq 150209 150209 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152646A-RPCM0001-515fB_S47_L001.assembled.fastq 27934 27934 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152647A-RPCM0009-515fB_S48_L001.assembled.fastq 8814 8814 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152648A-RPCM0018-515fB_S49_L001.assembled.fastq 14862 14862 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152649A-RPCM0038-515fB_S50_L001.assembled.fastq 7433 7433 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152650A-RPCM0041-515fB_S51_L001.assembled.fastq 13096 13096 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152651A-RPCM0050-515fB_S52_L001.assembled.fastq 21090 21090 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152653A-RPCM0057-515fB_S54_L001.assembled.fastq 21564 21564 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152654A-RPCM0093-515fB_S55_L001.assembled.fastq 6502 6502 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152655A-RPCM0129-515fB_S56_L001.assembled.fastq 23722 23722 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152658A-RPCM0157-515fB_S59_L001.assembled.fastq 4486 4486 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152659A-RPCM0161-515fB_S60_L001.assembled.fastq 6640 6640 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152660A-RPCM0177-515fB_S61_L001.assembled.fastq 1285 1285 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152661A-RPCM0194-515fB_S62_L001.assembled.fastq 8754 8754 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152662A-RPCM0209-515fB_S63_L001.assembled.fastq 16174 16174 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152663A-RPCM0217-515fB_S64_L001.assembled.fastq 2504 2504 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152664A-RPCM0225-515fB_S65_L001.assembled.fastq 2806 2806 0 0 0 0 100.0 0.0 0.0 0.0 0.0 152665A-RPCM0242-515fB_S66_L001.assembled.fastq 1398 1398 0 0 0 0 100.0 0.0 0.0 0.0 0.0 9152652A-RPCM0053-515fB_S53_L001.assembled.fastq 14960 14960 0 0 0 0 100.0 0.0 0.0 0.0 0.0 9152656A-RPCM0138-515fB_S57_L001.assembled.fastq 33840 33840 0 0 0 0 100.0 0.0 0.0 0.0 0.0 9152657A-RPCM0145-515fB_S58_L001.assembled.fastq 33899 33899 0 0 0 0 100.0 0.0 0.0 0.0 0.0

I attach my FastQC report for all my samples combined:

https://cloud.githubusercontent.com/assets/22190130/23174707/822c4d02-f85d-11e6-8e3d-b9d171dd30ce.png How would you correctly set up the filtering for a set of samples which looks like mine?

Thank you very much in advance for your help and for making this tools available for those not very skilled on programming.

Best regards,

Jose

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mlangill/microbiome_helper/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AL6T01sfVXGMKBsTU9e9fAloz2Mxlf3Zks5rexUbgaJpZM4MHkY1.

gavinmdouglas commented 7 years ago

If you have further problems you can post on our google group: https://groups.google.com/forum/#!forum/microbiome-helper

josemseoane commented 7 years ago

Thank you Gavin, I think I got it!

Best,

Jose

El 21 feb 2017, a las 22:34, Gavin Douglas notifications@github.com<mailto:notifications@github.com> escribió:

If you have further problems you can post on our google group: https://groups.google.com/forum/#!forum/microbiome-helper

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mlangill/microbiome_helper/issues/4#issuecomment-281487990, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVKYMgNHCdtGMVxUzZMVwHxc9NZYE0zuks5re1gZgaJpZM4MHkY1.


AVISO LEGAL: Esta información es privada y confidencial y está dirigida únicamente a su destinatario. Si usted no es el destinatario original de este mensaje y por este medio pudo acceder a dicha información, por favor, elimine el mensaje. La distribución o copia de este mensaje está estrictamente prohibida. Esta comunicación es solo para propósitos de información y no debería ser considerada como una declaración oficial de Repsol. La transmisión del correo electrónico no garantiza que sea seguro o esté libre de error. Por consiguiente, no manifestamos que esta información sea completa o precisa. Toda información está sujeta a alterarse sin previo aviso. This information is private and confidential and intended for the recipient only. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an official statement from Repsol. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice.