bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Adding `metaMDBG` into YEAT #57

Closed danejo3 closed 9 months ago

danejo3 commented 10 months ago

metaMDBG is a phenomenal tool for assembling metagenomic data. During my testing, I found that I could construct small plasmids where other popular long-read assemblers could not.

One of the biggest roadblocks to adding metaMDBG into YEAT's workflow is that it cannot be installed in the same Conda environment as YEAT. This is because of the incompatible packages of openssl and wfmash (see the following comment below).

To fix this, we'll need to have metaMDBG have its own conda environment for YEAT to activate during the snakemake workflow.

Another problem that arises with installing metaMDBG is that wfmash can only be installed in Linux environments. As a result, metaMDBG cannot be used on macOSX.

Even though the required OS environment of metaMDBG is a bit concerning, the situation has also been helpful in a way. Because of the package conflicts and the required OS, we can separate metaMDBG out into its own environment and do some checks to see if it exists.

danejo3 commented 10 months ago
(yeat-test) danejo@LAPTOP-PA3SAS5V:~/projects/metaMDBG$ mamba env update -f conda_env.yml
pkgs/main/linux-64                                            No change
pkgs/r/linux-64                                               No change
pkgs/r/noarch                                                 No change
bioconda/linux-64                                             No change
pkgs/main/noarch                                              No change
bioconda/noarch                                      4.7MB @   2.8MB/s  1.7s
conda-forge/noarch                                  12.3MB @   4.2MB/s  3.0s
conda-forge/linux-64                                30.0MB @   5.4MB/s  5.9s

Looking for: ['zlib', 'openmp', 'cxx-compiler', 'cmake', 'gsl==2.7=he838d99_0', 'wfmash=0.8.2', 'samtools=1.6', 'minimap2=2.24']

warning  libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
Could not solve for environment specs
The following packages are incompatible
├─ python is installable with the potential options
│  ├─ python [1.0.1|1.2|...|3.9.9], which can be installed;
│  └─ python 3.9.18 would require
│     └─ openssl >=3.1.2,<4.0a0 , which can be installed;
└─ wfmash 0.8.2**  is not installable because it requires
   └─ htslib >=1.15.1,<1.16.0a0  but there are no viable options
      ├─ htslib 1.15.1 would require
      │  └─ openssl >=1.1.1q,<1.1.2a , which conflicts with any installable versions previously reported;
      └─ htslib 1.15.1 would require
         └─ openssl >=1.1.1n,<1.1.2a , which conflicts with any installable versions previously reported.
danejo3 commented 10 months ago

Found an interesting problem. All gfa files produced by metaMDBG are binary but one file: assembly_config.gfa. When running Bandage on the other files, Bandage fails to open it because the contents are unparseable.

There are many gfa files from metaMDBG. In the output directory, the tmp directory contains 5 gfa files. -assembly_graph.gfa -assembly_graph.gfa.unitigs -assembly_graph.gfa.unitigs.nodepath -minimizer_graph.gfa -minimizer_graph_debug.gfa

metaMDBG does multiple iterations/passes with different k-mers when assembling. For each iteration/pass, a gfa file is created. I've decided to forgo creating a bandage image for each iteration because there are at least 140 k-mers.

Other than assembly_graph.gfa, the following information about the other files were detected as: MIME Type: application/octet-stream; Suggested file extension(s): bin lha lzh exe class so dll img iso