Issue running concoct with crossMapParallel pathway

jorgeln0 commented 3 years ago

I am trying to run concoct on metaGEM after running crossMapParallel. I am trying to run metaGEM using a large dataset that has already been quality filtered and assembled into contigs. I was successfully able to run my files through megahit and ran crossMapParallel. I used crossMapParallel since it's recommended for large datasets and it outputted the expected files into the kallisto folder. I ran concoct as the next job in the workflow which calls kallisto2concoct but fails after encountering the following error. Do you know how I can avoid the issue to be able to continue the workflow? Thank you!

P.S. Line 598 has the output file commented out so I removed the "#"

Traceback (most recent call last):
  File "/projectnb2/talbot-lab-data/jlopezna/metaGEM/scripts/kallisto2concoct.py", line 41, in <module>
    main(args)
  File "/projectnb2/talbot-lab-data/jlopezna/metaGEM/scripts/kallisto2concoct.py", line 22, in main
    samplename = samplenames[i]
IndexError: list index out of range

franciscozorrilla commented 3 years ago

Hi Jorge,

Thanks for your interest in metaGEM, and glad you were able to run most of the crossMapParallel subworkflow.

Based on the error message, I suspect that the culprit may be line 18 in the Snakefile:

https://github.com/franciscozorrilla/metaGEM/blob/d81186a0700f974b4f57db587b71b960a951db83/Snakefile#L18

Does your dataset folder contain sample specific subdirectories? Even if they are empty, metaGEM will look into this folder for determining sample IDs as shown here.

Regarding line 598, did you run into any rule dependency resolution errors after uncommenting? There are some commented outputs in the Snakefile for cases where alternative rules can generate the same output, and thus Snakemake would be unable to determine which rule to run. Commenting out the output for the "un-used" rule makes Snakemake happy.

Just curious, what type of sequencing data is it (e.g. human gut), how many samples do you have, and how big are they (e.g. size in GB or number of reads per sample)? Even if you have a lot of samples (e.g. >250), you may be able to use the crossMapSeries subworkflow if the samples are small.

Best wishes, Francisco

franciscozorrilla commented 3 years ago

Closing issue due to inactivity, please reopen if issues arise.

Xentrics commented 2 years ago

I encountered the same issue, but I found the solution. There is an error in the code that lists the sub-directories. In my case {input} did not end with a folder /, so instead of listing sub-directories, it listed only the main directory itself.

--samplenames <(for s in {input}*; do echo $s|sed 's|^.*/||'; done) \ Should be --samplenames <(for s in {input}/*; do echo $s|sed 's|^.*/||'; done) \

franciscozorrilla commented 2 years ago

Thanks @Xentrics! I will get around to fixing this soon hopefully. In case its fresh on your mind feel free to submit a PR fix, it would be greatly appreciated 🥲

kunaljaani commented 1 year ago

Hi @franciscozorrilla,

I am trying to run concoct on the toy dataset and getting a "TypeError" (please find the picture). Surprisingly, I could run both metabat and maxbin but somehow not able to run concoct. Could you please suggest some fix?

Thanks a lot. Kunal

franciscozorrilla commented 1 year ago

Hey @kunaljaani , this error seems to be associated with recent CONCOCT installations, in particular with the dependency package sklearn. For example, here you can see three issues in the CONCOCT repo detailing how to resolve this recent problem.

https://github.com/BinPro/CONCOCT/issues/321 https://github.com/BinPro/CONCOCT/issues/322 https://github.com/BinPro/CONCOCT/issues/323

Sounds like replacing scikit-learn 1.2 with scikit-learn 1.1 resolves this issue.

kunaljaani commented 1 year ago

Hi Francisco,

Thanks a lot for your rapid reply. Ya it was the issue of the scikit-learn by changing it to 1.1 worked. pip install scikit-learn==1.1.0

Thanks a lot. Kunal

franciscozorrilla / metaGEM

Issue running concoct with crossMapParallel pathway #57