Maggi-Chen / Inspector

A tool for evaluating long-read de novo assembly results
MIT License
24 stars 9 forks source link

Inspector-correct flye error #1

Closed mmontonerin closed 2 months ago

mmontonerin commented 2 years ago

Hi, I am finding some troubles with inspector-correct. After the last update of scripts, 4 days ago, now it is successfully producing the contig_corrected.fasta file. But when looking at the log of the correction process, every time that a structural error needs to be re-assembled with flye, an error like this occurs:

Base error correction for  ctg001250  finished. Time cost:  0.00589680671692
usage: flye (--pacbio-raw | --pacbio-corr | --nano-raw |
             --nano-corr | --subassemblies) file1 [file_2 ...]
             --genome-size size --out-dir dir_path [--threads int]
             [--iterations int] [--min-overlap int] [--resume]
             [--debug] [--version] [--help]
usage: flye (--pacbio-raw | --pacbio-corr | --nano-raw |
             --nano-corr | --subassemblies) file1 [file_2 ...]
             --genome-size size --out-dir dir_path [--threads int]
             [--iterations int] [--min-overlap int] [--resume]
             [--debug] [--version] [--help]
flye: error: argument -g/--genome-size is required
flye: error: argument -g/--genome-size is required
usage: flye (--pacbio-raw | --pacbio-corr | --nano-raw |
             --nano-corr | --subassemblies) file1 [file_2 ...]
             --genome-size size --out-dir dir_path [--threads int]
             [--iterations int] [--min-overlap int] [--resume]
             [--debug] [--version] [--help]
flye: error: argument -g/--genome-size is required
FLYETIME for  ctg001250__925207__925554__347__exp 0.044429063797
FLYETIME for  ctg001250__931956__933136__1180__exp 0.0446717739105
FLYETIME for  ctg001250__813565__813971__406__exp 0.0445201396942
Inspector Assembly Fail  ctg001250__925207__925554__347__exp
Inspector Assembly Fail  ctg001250__931956__933136__1180__exp
Inspector Assembly Fail  ctg001250__813565__813971__406__exp

It is hard to know if the end result will have that contig corrected, or if it failed in doing that. Is there any way to avoid that "genome size required" error that flye is producing?

Maggi-Chen commented 2 years ago

Hello Mercè,

Based on the log file, the structural errors were NOT corrected because Flye failed to generate new sequence to replace the original error-containing sequence in the contig. Can you check the version of Flye you are using? Since Flye 2.8, the --genome-size is no longer a required parameter. If you are using an older version, that might cause the problem. Can you try updating Flye to 2.8.2 or 2.8.3 and see if that helps?

Bests, Maggi

cherrie-g commented 2 years ago

Hi, I also met the problem. You mentioned in Installation that create a python 2.7 conda environment and install flye 2.8.3. But in my installation, I cannot install flye version 2.8.3 in a python 2.7 environment. It warned that "flye=2.8.3 -> python[version='>=3.10,<3.11.0a0|>=3.5,<3.6.0a0']". So I'm wondering how you install flye.

Also, I met another error, I don't know if it caused by flye version. It reported like: Traceback (most recent call last): File "/path/Inspector/inspector-correct.py", line 104, in inspector_correct.error_correction_large(chrominfo,ctginfo[chrominfo],aectg[chrominfo],snpctg[chrominfo],bamfile,outpath,inscor_args.datatype,inscor_args.thread/3) File "/path/Inspector/denovo_correct.py", line 409, in error_correction_large aeset=findpos(aeset,snpset,bamfile,outpath,datatype,thread) File "/path/Inspector/denovo_correct.py", line 105, in findpos aestart=int(c.split('\t')[1]) ValueError: invalid literal for int() with base 10: '837048;837369'

Does wrong version of flye give rise to this error?

Maggi-Chen commented 2 years ago

Hello cherrie,

Thank you for the feedback. This is a little surprising as I recall Flye does support Python 2.7. I did not really have problems when installing Flye in python 2.7 environment. I just created a new environment with conda create --name testflye python=2.7 and then activated it with conda activate testflye. Flye (version 2.8.3-b1695) was installed with conda install flye=2.8.3 with no warnings. The output of conda is like:

(testflye) [maggic@login004 ~]$ conda install flye=2.8.3 Solving environment: done

## Package Plan ##

environment location: /path/anaconda2/envs/testflye

added / updated specs:

  • flye=2.8.3

The following NEW packages will be INSTALLED:

flye: 2.8.3-py27h6a42192_1 bioconda

The following packages will be UPDATED:

openssl: 1.1.1l-h7f98852_0 conda-forge --> 3.0.0-h7f98852_2 conda-forge python: 2.7.15-h5a48372_1011_cpython conda-forge --> 2.7.18-h02575d3_0

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done

I am not sure why conda needs python 3 for Flye. If conda cannot install Flye for you, you can always install Flye manually and add it to your PATH. Inspector will work as far as you can execute Flye in your working directory.

The second one is not related to Flye. It was a bug in the older version of Inspector. We changed the term 'HeterozygosisError' to 'HaplotypeSwitch' in the output file "structural_error.bed", but the error correction function did not recognize the new term and reported this error. If you open the file "structural_error.bed" and see the word "HeterozygosisError", then it can be fixed if you clone the latest version of Inspector. Instead, if you see the term 'HaplotypeSwitch' , or the latest Inspector still reports this error, I may need you to send me the "structural_error.bed" file and see what was wrong.

Bests, Maggi

cherrie-g commented 2 years ago

Thank you Maggi. I will install an Flye 2.8.3 in other ways.

But as the second question, my Inspector version was the latest I think. And the "structural_error.bed" is like:

#Contig_Name    Start_Position       End_Position        Supporting_Read    Type            Size           Haplotype_Info  Depth_Left    Depth_Right     Depth_Min       Supporting_Read_Name                   Haplotype_Switch_Info
chr02           474714;475753        476346;475754       87                 HaplotypeSwitch Size=1632;124  -/-             79            105             79              m64118_201222_234054/59508416/ccs;...  77;44
chr04           113422678;113422992  113422898;113422993 57                 HaplotypeSwitch Size=220;190   -/-             65            86              65              m64114_201224_060626/94374297/ccs;...  38;19

Just the second situation you mentioned. So you maybe check what's wrong.

Bests.

mmontonerin commented 2 years ago

I had exactly the same problem through the installation, and that is why I was using an old version of Flye, and thus getting that error.

When trying to do conda install flye=2.8.3 it would tell me that does not exist. When doing conda install -c bioconda flye=2.8.3, I would run into the problems with the python version, as cherrie-g described above.

I fixed it now by configurating my conda, by adding bioconda as a channel where to look for packages: conda config --add channels bioconda conda config --add channels conda-forge (not sure if needed, but also added it)

Then, I was able to successfully run in my python2.7 environment the command: conda install flye=2.8.3

Hopefully now inspector-correct works! :)

cherrie-g commented 2 years ago

I had exactly the same problem through the installation, and that is why I was using an old version of Flye, and thus getting that error.

When trying to do conda install flye=2.8.3 it would tell me that does not exist. When doing conda install -c bioconda flye=2.8.3, I would run into the problems with the python version, as cherrie-g described above.

I fixed it now by configurating my conda, by adding bioconda as a channel where to look for packages: conda config --add channels bioconda conda config --add channels conda-forge (not sure if needed, but also added it)

Then, I was able to successfully run in my python2.7 environment the command: conda install flye=2.8.3

Hopefully now inspector-correct works! :)

It works for me too! Thank you!

mmontonerin commented 2 years ago

I confirm that now it is working perfectly, and finally using Flye to fix contigs!

cherrie-g commented 2 years ago

Hi Maggi, I met another error again. I installed flye in my python 2.7 environment like Mercè said. And updated my version to the latest. But when I run inspector-correct, there were error report like:

Inspector Assembly Fail  chr14__62439214__62439215__183__col
Base error correction for  unpl_scaffold412  finished. Time cost:  0.000214099884033
Base error correction for  unpl_scaffold418  finished. Time cost:  0.000110864639282
Base error correction for  unpl_scaffold32  finished. Time cost:  0.00259304046631
Base error correction for  unpl_scaffold33  finished. Time cost:  0.00162100791931
[2021-11-24 10:00:09] ERROR: Command '['flye-modules', 'polisher', '-h']' returned non-zero exit status 1
[2021-11-24 10:00:09] ERROR: Pipeline aborted
[2021-11-24 10:00:09] ERROR: Command '['flye-modules', 'polisher', '-h']' returned non-zero exit status 1
[2021-11-24 10:00:09] ERROR: Pipeline aborted
FLYETIME for  unpl_scaffold33__2116073__2116074__174__col 5.25100302696
FLYETIME for  unpl_scaffold33__2337240__2337241__188__col 5.25348806381

Although the contig_corrected.fa was generated, I think the correction was unfinished. So again, could you tell what's wrong with my run? @Maggi-Chen

Maggi-Chen commented 2 years ago

Hello cherrie,

Yes you are right. The contig_corrected.fa was generated, but structural error correction was skipped and only smaller errors were corrected. Based on the log, it seems local assembly with Flye have failed (pipeline of flye aborted). If you go to the output directory of Inspector, and then go into directory 'assemble_workspace', there should be some FASTA files named as 'read_asscontigname....fa'. These are the reads that Inspector uses for local de novo assembly with Flye. Can you check if these files are valid (i.e. correct FASTA format, contains some reads/sequences)? If they are, can you pick any one FASTA file and run Flye with default settings and see if Flye can finish the polishing step?

Also, can you double check if the Inspector's parameter '--datatype' is consistent with your sequencing data? The text needs to be exactly "pacbio-raw", "pacbio-hifi", or "nano-raw".

PS. I am still working on the HaplotypeSwitch issue. Do you still see that error after updating Inspector?

Bests, Maggi

frankmyou9172 commented 1 year ago

Hi Maggi, Recently I used Inspector to polish wheat ONT contigs and flax hifi assembly with PacBio ccs hifi reads. for some contigs, although the contigs are corrected, I still got error message. For example, for polishing a flax genome using PacBio hifi reads, : ase error correction for Novelty_Lu12 finished. Time cost: 0.0370790958404541 Base error correction for Novelty_Lu8 finished. Time cost: 0.05165505409240723 Base error correction for Novelty_Lu14 finished. Time cost: 0.048223018646240234 Inspector Assembly Fail Novelty_Lu14813298133059col Inspector Assembly Fail Novelty_Lu105500804550080552col Base error correction for Novelty_Lu11 finished. Time cost: 0.0274808406829834 Base error correction for Novelty_Lu13 finished. Time cost: 0.03021693229675293

I went to 'assemble_workspace' and found the sequences. I used these sequences to do assembly using flye with default parameter and got one assembled contig for each sequence file, indicating that the assembly was successfully. Then why was there an Inspector Assembly Fail error message for that contig?

For polishing the wheat ONT assembly, I got the ssame error message for more than 2000 ONT contigs: Inspector Assembly Fail utg3441962799196285657exp Inspector Assembly Fail utg34487521248752125297col Inspector Assembly Fail utg3442567607825676079106col Inspector Assembly Fail utg344370554703705547168col ....

How can we solve this?

Many thanks for your help.

Frank