allind / EukDetect

MIT License
40 stars 15 forks source link

Samples with uneven read length #27

Closed ChanyeongKim closed 2 years ago

ChanyeongKim commented 2 years ago

Hi, thanks for the nice tools.

Some of my samples have uneven read length. (e.g. sequencing data from IonTorrent sequencer) When I tried to run EukDetect with the average read length of the samples, it failed because more than 10bp difference were founded.

Can I use EukDetect on these samples?

allind commented 2 years ago

It can be done, but some of metrics you get as output can't be used.

To run EukDetect on uneven read length samples, skip the python package and use the snakemake rules directly, which doesn't run all the sanity checks. Instructions on this are in the readme under Running snakemake directly - please reach out if you run into issues. It's important you don't allow any alignments shorter than ~60 bp or so because shorter than this can misalign bacterial sequences to eukaryotic genes, so set the read length in the config file as 75 bp.

Now for the issues with interpretation: some of the metrics EukDetect reports rely on read counts aligned, operating under the assumption that read lengths are roughly equivalent in size. The RPKS and EukFrac metrics fall into this category, so you can't rely on these. You'll get more useful information from Total_marker_coverage, which reports the number of bases that have >1 aligned read, which is not reliant on read lengths being the same.