SamStudio8 / reticulatus

A snakemake-based pipeline for assembling and polishing long genomes from long nanopore reads
MIT License
68 stars 5 forks source link

Kraken rule throws InputFunctionException when provided with Ilm reads #54

Open tanaes opened 3 years ago

tanaes commented 3 years ago

I recently added some Illumina reads for polishing to my Nanopore assembly, and ended up with an empty InputFunctionException when I tried to run it. Weirdly, though, a single job would still kick off and end up producing an output in the directory with the initial reads.

Commenting out the reads kraken line in finish rule fixed the problem.

https://github.com/SamStudio8/reticulatus/blob/2b4bf40fc229b35089082da3b3a9a8a4aa8976e2/Snakefile-base#L326

Haven't sorted this out yet, but will give it some attention and submit a PR if I can find a fix.

SamStudio8 commented 3 years ago

Oh dear. The InputFunctionException implies an unhandled exception has been raised in enumerate_reads, which also calls get_reads. There must be some logic error in which I haven't accounted for Illumina data. Unfortunately, this doesn't surprise me as I haven't tested the Illumina bits as robustly...

SamStudio8 commented 3 years ago

Ah on second thought, that's a red herring. I don't think anything is wrong with that enumerate_reads - commenting out that line suppresses the k2kc generating rule (ktkit_count), which is probably where the InputFunctionException is coming from.

Indeed, on closer inspection I have done something horrible. The ktkit_count rule requires a parameter set by get_samplename_from_readpath.

https://github.com/SamStudio8/reticulatus/blob/2b4bf40fc229b35089082da3b3a9a8a4aa8976e2/Snakefile-base#L291

That function - very unhelpfully - flat out raises an Exception if it reaches the end without returning something. This rule was always going to fail for your Illumina data because I have hard coded it for the ont readtype because I obviously hate myself and my users.

    try:
        return reads_lookup.loc[reads_lookup['ont'] == path]["samplename"][0]
    except:
        for samplename in reads_lookup["samplename"]:
            if os.path.basename(path).split(".")[0] == samplename:
                return samplename
    raise Exception
SamStudio8 commented 3 years ago

In lieu of a better idea, I suppose the quick fix here would be to try all the possible readtype columns (ont, i1, i2).

tanaes commented 3 years ago

Ah I see! Thanks for tracking this down! I will do some fixes on my branch and keep you posted.

SamStudio8 commented 3 years ago

Appreciate that! I'm happy to push a branch as per my suggestion if needed.

On Thu, Nov 12, 2020 at 2:26 PM Jon Sanders notifications@github.com wrote:

Ah I see! Thanks for tracking this down! I will do some fixes on my branch and keep you posted.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/SamStudio8/reticulatus/issues/54#issuecomment-726110763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIN6OUAMHAYDWTCUIXBEULSPPWCLANCNFSM4TSHELQQ .