Shamir-Lab / Recycler

This is the codebase for Recycler, described in our manuscript: https://academic.oup.com/bioinformatics/article/33/4/475/2623362, by Roye Rozov, Aya Brown Kav, David Bogumil, Naama Shterzer, Eran Halperin, Itzhak Mizrahi, and Ron Shamir
BSD 3-Clause "New" or "Revised" License
57 stars 7 forks source link

make_fasta_from_fastg.py #6

Closed rprops closed 7 years ago

rprops commented 7 years ago

Hi

The make_fasta_from_fastg.py script pops an error when I try to apply it to a Megahit produced fastg file. It is asking for a bam file while that command should not require one based on your example on how to create the BAM file. image

make_fasta_from_fastg.py -g assembly_graph.fastg

Running it with the BAM file (reads vs contigs used to create fastg) gives this error: image

Thanks

Ruben

shenwei356 commented 7 years ago

Hi @rprops , you forgot paste the error message

rprops commented 7 years ago

Hi @shenwei356 you can see the error messages in the attached screenshots. In text: for make_fasta_from_fastg.py it just says error: argument -b/--bam is required and the other error message says NameError: name 'get_fastg_digraph' is not defined.

shenwei356 commented 7 years ago

Which version are you using?

make_fasta_from_fastg.py does not have the option -b:

usage: make_fasta_from_fastg.py [-h] -g GRAPH                                                                      

recycle extracts cycles likely to be plasmids from metagenome and genome                                           
assembly graphs                                                                                                    

optional arguments:                                                                                                
  -h, --help            show this help message and exit                                                            
  -g GRAPH, --graph GRAPH                                                                                          
                        (spades 3.50+) FASTG file to process [recommended:
                        before_rr.fastg]

You may read the doc again:

https://github.com/Shamir-Lab/Recycler#preparing-the-bam-input

rprops commented 7 years ago

I am using the latest version v0.61. I know that it should not require the -b option. But it tells me to provide it anyway. Here is what I get by calling the help option:

[rprops@flux-login1 plasmid]$ make_fasta_from_fastg.py -h
usage: make_fasta_from_fastg.py [-h] -g GRAPH -k MAX_K -b BAM [-l LENGTH]
                                [-m MAX_CV] [-i ISO]

Recycler extracts likely plasmids (and other circular DNA elements) from de
novo assembly graphs

optional arguments:
  -h, --help            show this help message and exit
  -g GRAPH, --graph GRAPH
                        (spades 3.50+) assembly graph FASTG file to process;
                        recommended for spades 3.5: before_rr.fastg, for
                        spades 3.6+:assembly_graph.fastg
  -k MAX_K, --max_k MAX_K
                        integer reflecting maximum k value used by the
                        assembler
  -b BAM, --bam BAM     BAM file resulting from aligning reads to contigs
                        file, filtering for best matches
  -l LENGTH, --length LENGTH
                        minimum length required for reporting [default: 1000]
  -m MAX_CV, --max_CV MAX_CV
                        coefficient of variation used for pre-selection
                        [default: 0.5, higher--> less restrictive]
  -i ISO, --iso ISO     True or False value reflecting whether data sequenced
                        was an isolated strain
shenwei356 commented 7 years ago

Something is wrong. The usage you pasted is actually that of recycle.py (source code)

How did you install Recycler? Can you install by source (https://github.com/Shamir-Lab/Recycler#installation)

rprops commented 7 years ago

Yeah I thought that was the issue. I did install from source as such:

git clone https://github.com/rozovr/Recycler.git
cd Recycler
python setup.py install --user

I'm running python-anaconda2/201607. Installation runs smoothly. I've tried uninstalling with pip and then reinstalling but the same errors appear.

shenwei356 commented 7 years ago

I'll try to fix it and submit a PR.

shenwei356 commented 7 years ago

Before the PR #7 being merged, you can use my fork. It works.

git clone https://github.com/rozovr/Recycler.git
python setup.py install --user
rozovr commented 7 years ago

Hi @shenwei356 and @rprops,

can you help me understand what you think is the problem, before I make the merge? I'd like to understand what led to this issue.

Thanks, R

shenwei356 commented 7 years ago

To make it easy to understood, I write some test codes to simulate what's going on :

Before @druvus 's commit, the file structure was

.
├── make_fasta_from_fastg.py
├── recycle
│   ├── __init__.py
│   └── lib.py
└── recycle.py

Codes:

$ cat recycle/lib.py 
print('library recycle.lib imported')

$ cat make_fasta_from_fastg.py 
print('run make_fasta_from_fastg.py')
import recycle
print('hello from make_fasta_from_fastg.py')

$ cat recycle.py 
print('run recycle.py')
from recycle import *
print('hello from recycle.py')

Everything was OK:

$ python make_fasta_from_fastg.py 
run make_fasta_from_fastg.py
hello from make_fasta_from_fastg.py

$ python recycle.py 
run recycle.py
hello from recycle.py

However, after the scripts being moved to directory bin, somethings weird we talked above happen. make_fasta_from_fastg.py prints usage of recycle.py !!!!???

$ python make_fasta_from_fastg.py 
run make_fasta_from_fastg.py
run recycle.py
hello from recycle.py
hello from make_fasta_from_fastg.py

$ python recycle.py
run recycle.py
run recycle.py
hello from recycle.py
hello from recycle.py

That's caused by the import order of python .Because the import line in make_fasta_from_fastg.py,

from recycle.utils import readfq

It tries to import library recycle. Before searching library recycle in PYTHONPATH, python searches current directory. Firstly, there's no library/directory recycle in the current path (there is before the commit). "Luckily", a script recycle.py is in the same directory, so make_fasta_from_fastg.py imports recycle.py.


To solve this, we can either rename the recycle.py or the library recycle, and of cause, I chose the later one.

rprops commented 7 years ago

Thanks @shenwei356 this fixed the issue I had with make_fasta_from_fastg.py. I could now generate a correctly formatted BAM file. However, when running recycle.py, I get another error stating NameError: name 'get_fastg_digraph' is not defined.

[rprops@nyx7011 plasmid]$ recycle.py -g k99.fastg -k 99 -b reads_pe_primary.sort.bam -i False
Traceback (most recent call last):
  File "/home/rprops/.local/bin/recycle.py", line 4, in <module>
    __import__('pkg_resources').run_script('recycler==0.62', 'recycle.py')
  File "/home/rprops/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 738, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/rprops/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1506, in run_script
    exec(script_code, namespace, namespace)
  File "/home/rprops/.local/lib/python2.7/site-packages/recycler-0.62-py2.7.egg/EGG-INFO/scripts/recycle.py", line 69, in <module>

NameError: name 'get_fastg_digraph' is not defined
shenwei356 commented 7 years ago

Oh, I fix it. Please reinstall it.

rprops commented 7 years ago

Excellent that fixed it. Thank you so much for the quick response @shenwei356! I think we can close this issue!

shenwei356 commented 7 years ago

Not yet, we need to wait for @rozovr