When input cfdna fastq data is masked (human reads are masked)

MHH-RCUG / Wochenende

Deprecated see https://github.com/MHH-RCUG/nf_wochenende : A whole Genome/Metagenome Sequencing Alignment Pipeline in Python3

https://github.com/MHH-RCUG/nf_wochenende

MIT License

37 stars 16 forks source link

When input cfdna fastq data is masked (human reads are masked) #222

Open arpit20328 opened 2 months ago

arpit20328 commented 2 months ago

Hi @colindaven @sannareddyk @B1T0 @Colorstorm @BioNij

Wochenende ran well when we inputed cfdna fastq profiles containing human reads as well with other pathogenic reads.

We were interested in how wochenende will respond when we input same cfdna fastq profiles but with masking human origin reads.

I masked human reads with help of tool https://github.com/ncbi/sra-human-scrubber

Is there a difference if we use same reference database and any difference in the interpretation of results ?

I will be using relative abundance as = number of assigned reads / seq_length as my parameter to calculate abundance.

colindaven commented 2 months ago

Hi @arpit20328

glad you're getting good results out of the tool.

If you mask human reads then you will get less mappings to the human genome. This is assuming of course no microbial reads are considered as human by mistake and masked.

Less or no mappings to the human genome will not cause any great problems. In the reporting step which we recommend using for normalization, it will break the bacteria per human cell normalization of course, since that relies on human read mappings. Otherwise you should be fine.

If you want to use any other form of abundance calculation, sure, but we only support and encourage use of the normalization in the reporting subdir.

cheers

Maybe @irosenboom has played with masking human reads before?

arpit20328 commented 2 months ago

thanks @colindaven . great tool by the way.

irosenboom commented 2 months ago

Hi @arpit20328 , I am glad that Wochenende runs well on your input fastq files.

I agree with @colindaven that masking the human reads will only break the bacteria per human cell normalization. The other steps work perfectly fine and the pipeline will run even faster.