amarinderthind / decontaminer

DecontaMiner is a tool designed and developed to investigate the presence of contaminating sequences in unmapped NGS data. It can suggest the presence of contaminating organisms in sequenced samples, that might derive either from laboratory contamination or from their biological source, and in both cases can be considered as worthy of further investigation and experimental validation. The novelty of DecontaMiner is mainly represented by its easy integration with the standard procedures of NGS data analysis, while providing a complete, reliable, and automatic pipeline. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2684-x
http://www-labgtp.na.icar.cnr.it/decontaminer/
9 stars 0 forks source link

Format error on paired end names! #1

Closed rgarcia27 closed 2 years ago

rgarcia27 commented 2 years ago

Hi, I'm trying to run deconaMiner. The data I'm using is paired-end and in fastq format but as the manual says I merged the two fastqs. The problem is that when I try to run filterBlastInfo.sh I get this error

Format error on paired end names! Paired end should be numbered as xxx/1 and xxx/2 where xxx is the name of the query

I don't know if I'm merging the fastqs wrong or something, could you help me please?

Thanks!

amarinderthind commented 2 years ago

Please print the header of your fastq file... and check.

Decontaminer expects the following format:

@A00121:137:HTLF3DSXX:3:1110:3097:35571/1 @A00121:137:HTLF3DSXX:3:1110:3097:35571/2

If you have other format (such as mentioned in example below) of PR reads, you can rename/change from this to the required one using simple linux command mention at the end.

@A00121:137:HTLF3DSXX:3:1110:3097:35571 0:N: 00 @A00121:137:HTLF3DSXX:3:1110:3097:35571 1:N: 00

for this case on Linux terminal

sed 's/ 0:N:0:/\/1/g' inputfile > outputfile
sed 's/ 1:N:0:/\/2/g' inputfile > outputfile

Should work...