PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

python -m not finding function for get_read_ctg_map #56

Closed YourePrettyGood closed 7 years ago

YourePrettyGood commented 7 years ago

Hi, First off, thanks a ton for all your work on FALCON and FALCON_unzip! I've been using FALCON for about a year now, and the results are great!

--Note-- This may be related to FALCON issue #499: https://github.com/PacificBiosciences/FALCON/issues/499

I've run into a series of issues trying to run FALCON_unzip on one of my assemblies (genome size estimate of between 380 and 430 Mb, but the p_ctg.fa comes out near 580-590 Mb, so I suspect haplotigs are being kept together in p_ctg.fa).

The first error involves a Python exception being thrown (an AttributeError) that the 'module' object does not have attribute 'get_read_ctg_map'. This seems to trace back to the python -m get_read_ctg_map call in track_reads.sh generated by task_track_reads() in unzip.py. I've also tried manually calling python -m get_read-ctg_map with and without deleting mypwatcher and other relevant directories and files, basically resetting the FALCON_unzip run to 0.

What did succeed from there was running the DBshow commands manually, and running the (slightly-modified) contents of the generate_read_to_ctg_map() function.

After that, I'm able to run python -m rr_ctg_track and python -m pr_ctg_tracksuccessfully, but then hit another error running python -m fetch_reads complaining about input.fofn not existing as a file. I used a non-canonical input FOFN name, and included this in my fc_unzip.cfg, but it doesn't seem to be used in task_track_reads() in unzip.py (should be easy to tack into the config[] dictionary in main() in unzip.py, assign it to a variable in task_track_reads(), and then throw that after a --fofn in the script definition for track_reads.sh).

And a third note (not an error, but a suggestion), maybe pypeFLOW already does this, but it'd be good to avoid repeating get_read_ctg_map, rr_ctg_track, pr_ctg_track, and fetch_reads if their output files already exist. Since I manually ran the steps, as soon as I tried to re-run fc_unzip.py to get to alignment and phasing, FALCON_unzip tried to redo all the steps I had run manually, and failed again on get_read_ctg_map.

Version info: I tried first with FALCON_unzip c936cb9 on the p_ctg.fa generated by FALCON 0.4.0 (don't remember which commit), and then installed the newest via FALCON-integrate a few days ago and re-ran to see what might be happening. Newest versions used: FALCON-integrate: 3b36fd91d36149ec000ab35fbd1e8b7646f8e95c FALCON: 87b6262607a885979727592c5aa0dad459085f33 FALCON-make: 02ed9dedb0d5bc7bec75bfad8aa0a92629ad0099 FALCON_unzip: c936cb94a63cc763f9733d2f497ccc00a68ba02c pypeFLOW: bdd41dbc5ac4e4cf1dec1fa57e52850b399d85aa

Install was via PYTHONUSERBASE, I'm using SLURM, and I unset PYTHONPATH and source env.sh prior to running fc_unzip.py each time. I've tried with Python 2.7.3 and 2.7.12 to see if that makes a difference (maybe they fiddled with the handling of the -m flag?).

Potential source for the first error: Comparing the scripts getting called with python -m, it looks like get_read_ctg_map.py is the only one that uses any makePypeLocalFile() calls and runs functions via a PypeTask. Is this even necessary? It seems that get_read_ctg_map.py isn't terribly computationally intensive compared to the others, so adding extra Pype tasks seems an unnecessary additional complexity. Attached is an example (and apologies, but I forgot to remove the PypeProcWatcherWorkflow part). get_read_ctg_map.py.txt

Again, thanks for your time and all your great work!

Best regards, Patrick Reilly

pb-cdunn commented 7 years ago

No, PypeTasks aren't really needed for that script. But it should still run. You have some kind of version mis-match. I see this:

python -m falcon_kit.mains.get_read_ctg_map

You report this:

FALCON_unzip: c936cb9

I do not believe that you are actually using that version.

26034b9f falcon_unzip/unzip.py (Christopher Dunn 2016-11-21  48) python -m falcon_kit.mains.get_read_ctg_map

That commit is included under yours:

* 1820d82 (HEAD, origin/master, origin/HEAD, master) Rm copyrighted script
* c936cb9 Drop FALCON submodule
* b04f104 Drop TaskBase, URL from PypeTask
* 8e4f06e Moved README into wiki, so we do not need to edit the code-tree
*   22c492e Merge branch 'simple' into master
|\
| * d2dd81a Update README.md
| * 1bb0010 Fix some task dependencies
| * a532bb4 unzip works now
| * aa0700b fix read_map dirs
| * 7a10666 Drop setNumThreadAllowed, old pypeflow use
| * e1d73e6 PEP-8 spacing
| * 60d88aa single quotes
| * 26034b9 simpler script def
...

So I think you have an integration problem, which is by far the most difficult thing for us to address remotely.

Note the FALCON-integrate/FALCON-make does not yet install FALCON_unzip. You still have to do that yourself. (I will address that this weekend.)

YourePrettyGood commented 7 years ago

Sorry, that was a typo on my part, I did run python -m falcon_kit.mains.get_read_ctg_map and it produced the AttributeError as stated above.

Here's the first commit from the output of git log in my FALCON_unzip directory (which I had to run python setup.py install --prefix=[path to fc_env subdirectory of FALCON-integrate folder] in after installing FALCON-integrate, and don't see any errors from python setup.py install):

$ git log commit c936cb94a63cc763f9733d2f497ccc00a68ba02c Author: Christopher Dunn cdunn2001@gmail.com Date: Sun Nov 27 12:19:30 2016 -0600

Drop FALCON submodule

Here's the command and output generated by running that manually:

$ python -m falcon_kit.mains.get_read_ctg_map WARNING:pypeflow.simple_pwatcher_bridge:In simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from '/Genomics/grid3/users/preilly/bin/FALCON_0.7.0/FALCON-integrate/pypeFLOW/pwatcher/fs_based.pyc'> ERROR:pypeflow.simple_pwatcher_bridge:Task Node(2-asm-falcon/read_maps/dump_rawread_ids) failed with exit-code=256 ERROR:pypeflow.simple_pwatcher_bridge:Some tasks are recently_done but not satisfied: set([Node(2-asm-falcon/read_maps/dump_rawread_ids)]) ERROR:pypeflow.simple_pwatcher_bridge:ready: set([]) submitted: set([Node(2-asm-falcon/read_maps/dump_pread_ids)]) Traceback (most recent call last): File "/usr/local/python/2.7.12/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/local/python/2.7.12/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/Genomics/grid3/users/preilly/bin/FALCON_0.7.0/FALCON-integrate/FALCON/falcon_kit/mains/get_read_ctg_map.py", line 137, in main() File "/Genomics/grid3/users/preilly/bin/FALCON_0.7.0/FALCON-integrate/FALCON/falcon_kit/mains/get_read_ctg_map.py", line 134, in main get_read_ctg_map(rawread_dir=rawread_dir, pread_dir=pread_dir, asm_dir=asm_dir) File "/Genomics/grid3/users/preilly/bin/FALCON_0.7.0/FALCON-integrate/FALCON/falcon_kit/mains/get_read_ctg_map.py", line 96, in get_read_ctg_map wf.refreshTargets() # block File "/Genomics/grid3/users/preilly/bin/FALCON_0.7.0/FALCON-integrate/pypeFLOW/pypeflow/simple_pwatcher_bridge.py", line 210, in refreshTargets self._refreshTargets(updateFreq, exitOnFailure) File "/Genomics/grid3/users/preilly/bin/FALCON_0.7.0/FALCON-integrate/pypeFLOW/pypeflow/simple_pwatcher_bridge.py", line 277, in _refreshTargets raise Exception(msg) Exception: Some tasks are recently_done but not satisfied: set([Node(2-asm-falcon/read_maps/dump_rawread_ids)])`

pb-cdunn commented 7 years ago

python setup.py install

It is possible that your system install is not being updated. You might need --force. Or, if you are using a virtualenv, just delete it and re-install everything.

I recommend using pip install --edit. That uses "edit" mode, which means that only a kind of symbolic link (egg-info) is installed. Then, whenever you update pure python code, you do not necessarily need to re-install.

pb-cdunn commented 7 years ago

I have reproduced this locally. I see the problem:

$ cat 2-asm-falcon/read_maps/dump_rawread_ids/task.json
{
    "inputs": {
        "rawread_db": "/home/UNIXHOME/cdunn/repo/localhost/unzip/iter/0-rawreads/raw_reads.db"
    },
    "outputs": {
        "rawread_id_file": "rawread_ids"
    },
    "parameters": {},
    "python_function": "__main__.dump_rawread_ids"
}

__main__ needs to be an actual module. Working on it...

YourePrettyGood commented 7 years ago

Awesome, thanks!

pb-cdunn commented 7 years ago

But that's not the only problem. I will push an update to FALCON-integrate in a few minutes...

pb-cdunn commented 7 years ago

1.8.5 fixes a couple other things too. Passes Quiver for me now.