faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
79 stars 49 forks source link

running match_contigs_to_probes #24

Closed malivezey closed 9 years ago

malivezey commented 9 years ago

here is my directory structure: ml@ml-laptop:~$ ls anaconda Desktop Downloads log Music output Public uce-2.5k-probes.fasta anaconda3 Documents examples.desktop log-path Myco Pictures Templates Videos

here is my command: (first path is not needed!) ml@ml-laptop:~$ /home/ml/anaconda/bin/match_contigs_to_probes.py Myco uce-5k-probes.fasta output log

here is the error message: Traceback (most recent call last): File "/home/ml/anaconda/bin/match_contigs_to_probes.py", line 18, in from phyluce import lastz File "/home/ml/anaconda/lib/python2.7/site-packages/phyluce/init.py", line 16, in proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

What is going wrong?

The folder 'Myco' contains assembled masked sequences from 1000 fungal genomes

tandermann commented 9 years ago

are your input files in the Myco folder named in this format: genus_species1.fasta, genus_species2.fasta etc.?

malivezey commented 9 years ago

Tobias, No sir. They are fasta files, yes. But not with the filename incremented as advised above. The .tar.gz files are also in the same folder. I thought it would read the first file with a fasta extension, and move to the 2nd. I will give this a try. Just a little coaching like this will be very helpful. Thanks very much for responding. I have great hopes for these programs even if the initial response of my instructors has been muted.

Thanks very much! Martin Livezey

On Tue, Mar 24, 2015 at 10:26 AM, Tobias Hofmann notifications@github.com wrote:

are your input files in the Myco folder named in this format: genus_species1.fasta, genus_species2.fasta etc.?

— Reply to this email directly or view it on GitHub https://github.com/faircloth-lab/phyluce/issues/24#issuecomment-85522110 .

malivezey commented 9 years ago

I find the same result regardless of what my fasta files are named. Here is Myco directory:

ml@ml-laptop:~/Myco$ ls -1 Antsi1_AssemblyScaffolds_Repeatmasked.fasta Aspfl1_AssemblyScaffolds_Repeatmasked.fasta Beaba1_AssemblyScaffolds_Repeatmasked.fasta Clagr3_AssemblyScaffolds_Repeatmasked.fasta Cormi1_AssemblyScaffolds_Repeatmasked.fasta Exigl1_AssemblyScaffolds_Repeatmasked.fasta Fompi3_AssemblyScaffolds_Repeatmasked.fasta genus_species1.fasta genus_species2.fasta genus_species3.fasta gz HypCI4A_1_AssemblyScaffolds_Repeatmasked.fasta Morco1_AssemblyScaffolds_Repeatmasked.fasta Phchr2_AssemblyScaffolds_Repeatmasked.fasta Puccinia_graminis.masked.fasta Ramac1_AssemblyScaffolds_Repeatmasked.fasta Tapde1_1_AssemblyScaffolds_Repeatmasked.fasta

Maybe I am using the match contigs command in the wrong way.

ml@ml-laptop:~$ match_contigs_to_probes.py Myco uce-5k-probes.fasta output log Traceback (most recent call last): File "/home/ml/anaconda/bin/match_contigs_to_probes.py", line 18, in from phyluce import lastz File "/home/ml/anaconda/lib/python2.7/site-packages/phyluce/init.py", line 16, in proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

produces the same error as:

ml@ml-laptop:~$ match_contigs_to_probes.py Traceback (most recent call last): File "/home/ml/anaconda/bin/match_contigs_to_probes.py", line 18, in from phyluce import lastz File "/home/ml/anaconda/lib/python2.7/site-packages/phyluce/init.py", line 16, in proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

Is my setup wrong or is repository out-of-date?

tandermann commented 9 years ago

There is something wrong with your command, I'm sorry, I didn't pay to much attention to that before. Try this command (in your specific case):

match_contigs_to_probes.py --contigs Myco/ --probes uce-2.5k-probes.fasta --output output/ --log-path log/

I would try to clean the input folder up and only keep the target files in there (the fasta containing the contigs, nothing else). I wrote the whole workflow in one document to get a better overview and I hope I'm explaining the steps well enough to follow:

https://github.com/tobiashofmann88/NGS/wiki/UCE-tutorial

but also check documentation on the phyluce github, it covers many more options.

malivezey commented 9 years ago

Thank you again Tobias. By the time you wrote, we have figured out the mistake in the command. It had the same result. We discovered that the program was not able to call phyluce! So I either have a path problem or somehow I have not installed the repository correctly, even though I have run update and upgrade and everything seemed to be fine. So I and my new partner, who has more experience with python and Linux will continue to experiment. We know you have put a lot into it.

Since I have your attention, I would like to share my idea to see if you think it is scientifically valid. I want to use UCE's to test or provide support and make phylogenetic trees of fungi. The standard PCR approach that relies on ITS, LSU, SSU, TEF1, and others works pretty well, especially in practical terms. But larger taxonomic issues remain and that seem to pervade mycology in general. My hope is that using say 10,000 base pairs of regions adjacent to UCEs will give a better picture. I am planning to use assembled sequences from the JGI 1000 fungal genomes website /MycoCosm.

http://genome.jgi-psf.org/programs/fungi/index.jsf

Maybe we can develop a new probe set based on the 5k one you have that responsibly spans yeast to a fancy mushroom. (Sarrchomycotina to Agaricomycotina).

What do you think?

I am doing this as a part of a class, here is the description of the course from the catalog:

BIOF 521 Spring, 3 credits Bioinformatics for Analysis of Data Generated by Next Generation Sequencing Ben Busby Sijung Yun* In this course, students will learn to analyze next generation sequencing data, particularly for DNA-seq, RNA-seq, CHIP-seq, and DNAmethylation. The course will be divided between lectures and hands-on sessions. Lectures will cover background knowledge and survey various software programs. For hands-on sessions, we will primarily focus on the use of the Galaxy platform for analysis of raw data obtained from an Illumina’s HiSeq-2000 and data available in the NCBI-SRA. Use of distributed and abstracted computing, such as Biowulf and cloud computing will also be covered. There will be a term project in which students will design projects relevant to their research. Learning objectives: n Learn to analyze Next Generation Sequencing data including DNA-seq, RNA-seq, and CHIP-seq in Graphical User Interface using Galaxy or in command line n Write short scripts to do this analysis using command line resources. Prerequisites: students will be expected to bring their own laptop to each session.

Here is a link to the school:

http://www.faes.org/grad

Here is why I am taking it: for the love of mushrooms!

http://mushroomobserver.org/observer/observations_by_user/2584

Thanks again very much for your time. Martin

malivezey commented 9 years ago

Hi Tobias, Please don't think I am a lost cause. I decided to test the program lines of match_contigs_to_probes.py first line by line, then in a bash script shown below. Result seem to indicate that my path is not setup right or the repository or program or some dependency is out-of-date or improperly updated, for instance argparse. Response to bash script is after " --log-path log" below.

sudo apt-get install re find re find os find sys find glob find copy sudo apt-get install sqlite3 find sqlite3 import argparse sudo apt-get install argparse find argparse find phyluce from phyluce import lastz find lastz find phyluce.helpers from phyluce.helpers import is_dir, is_file, FullPaths find is_dir find is_file find FullPaths find phyluce.log from phyluce.log import setup_logging find setup_logging find collections from collections import defaultdict find defaultdict find Bio from Bio import SeqIO find SeqIO find pdb import pdb match_contigs_to_probes.py \ --contigs /Myco \ --probes uce-5k-probes.fasta \ --output /output \ --log-path log

ml@ml-laptop:~$ bash rmcont.foo [sudo] password for ml: Reading package lists... Done Building dependency tree
Reading state information... Done re is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded. re os sys glob copy Reading package lists... Done Building dependency tree
Reading state information... Done sqlite3 is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded. sqlite3 Reading package lists... Done Building dependency tree
Reading state information... Done E: Unable to locate package argparse argparse find: phyluce': No such file or directory from: can't read /var/mail/phyluce find:lastz': No such file or directory find: phyluce.helpers': No such file or directory from: can't read /var/mail/phyluce.helpers find:is_dir': No such file or directory find: is_file': No such file or directory find:FullPaths': No such file or directory find: phyluce.log': No such file or directory from: can't read /var/mail/phyluce.log find:setup_logging': No such file or directory find: collections': No such file or directory from: can't read /var/mail/collections find:defaultdict': No such file or directory find: Bio': No such file or directory from: can't read /var/mail/Bio find:SeqIO': No such file or directory find: pdb': No such file or directory import.im6: unable to grab mouse': Resource temporarily unavailable @ error/xwindow.c/XSelectWindow/9047. Traceback (most recent call last): File "/home/ml/anaconda/bin/match_contigs_to_probes.py", line 18, in from phyluce import lastz File "/home/ml/anaconda/lib/python2.7/site-packages/phyluce/init.py", line 16, in proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory ml@ml-laptop:~$

malivezey commented 9 years ago

Tobias, I believe I am following your instructions, this is what I get:

here is my working folder: ml@ml-laptop:~/contigs$ ls -l total 63244 -rw-r--r-- 1 ml ml 11740461 Feb 15 19:58 genus_species1.fasta -rw-r--r-- 1 ml ml 41529718 Feb 16 15:12 genus_species2.fasta -rw-r--r-- 1 ml ml 10962406 Feb 16 15:12 genus_species3.fasta drwxrwxr-x 2 ml ml 4096 Mar 30 20:26 log drwxrwxr-x 2 ml ml 4096 Mar 30 20:33 mapped_uce -rw-rw-r-- 1 ml ml 511332 Mar 18 18:58 uce-2.5k-probes.fasta

here is the response of the terminal: ml@ml-laptop:~/contigs$ match_contigs_to_probes.py --contigs contigs/ --probes uce-2.5k-probes.fasta --output mapped_uce/ --log-path log Traceback (most recent call last): File "/home/ml/anaconda/bin/match_contigs_to_probes.py", line 18, in from phyluce import lastz File "/home/ml/anaconda/lib/python2.7/site-packages/phyluce/init.py", line 16, in proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/home/ml/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory ml@ml-laptop:~/contigs$

tandermann commented 9 years ago

Hey Martin, I'm sorry for not responding to your previous posts, I'm quite busy at the moment. By the way, I was not part of the development of this software, I'm also a user, just like you. But after spending some time with it I managed to get it working for my data, but I know the pain of running from one issue into the next and not knowing what's wrong. But if you follow the instructions I wrote step by step it should work (I made sure to write them very detailed), assuming that everything is installed and set up correctly. I'm not exactly sure what the issue is that you are running into. One thing I could think of:

Did you change your path from something like /usr/local/bin into /usr/local/anaconda/bin ?? You do that by editing the .bashrc file in your home driectory (it's usually an invisible file). You have to change this line (or equivalent): export PATH="/usr/local/bin:/usr/local/jar:$PATH:/usr/local/bin"

into something along these lines export PATH="/usr/local/anaconda/bin:/usr/local/anaconda/jar:$PATH:/usr/local/bin"

Best, Tobi

tandermann commented 9 years ago

also add this line to your .bashrc file: export CONDA_DEFAULT_ENV=/usr/local/anaconda

tandermann commented 9 years ago

The link to my instruction manual has changed and is now: https://github.com/tobiashofmann88/UCE-data-management/wiki

malivezey commented 9 years ago

Hey Tobi, Thanks so much for responding. I understand (and applaud that) you have other pressing issues in your life. Are you suggesting that the path be exactly the same? I am not at my linux computer at the moment, but I did modify my .bashrc to follow the form you suggest (but for another path). Also - are you suggesting that the file names match exactly or simply have exactly the same form, i.e. text1.fasta?

I have another SSD, so maybe I will clean it up and do the entire install process from the beginning. It should not take me long at this point. A friend did install KDE, so that may have gummed things up. Thanks, Martin

On Tue, Mar 31, 2015 at 5:02 AM, Tobias Hofmann notifications@github.com wrote:

Hey Martin, I'm sorry for not responding to your previous posts, I'm quite busy at the moment. By the way, I was not part of the development of this software, I'm also a user, just like you. But after spending some time with it I managed to get it working for my data, but I know the pain of running from one issue into the next and not knowing what's wrong. But if you follow the instructions I wrote step by step it should work (I made sure to write them very detailed), assuming that everything is installed and set up correctly. I'm not exactly sure what the issue is that you are running into. One thing I could think of:

Did you change your path from something like /usr/local/bin into /usr/local/anaconda/bin ?? You do that by editing the .bashrc file in your home driectory (it's usually an invisible file). You have to change this line (or equivalent): export PATH="/usr/local/bin:/usr/local/jar:$PATH:/usr/local/bin"

into something along these lines export PATH="/usr/local/anaconda/bin:/usr/local/anaconda/jar:$PATH:/usr/local/bin"

Best, Tobi

— Reply to this email directly or view it on GitHub https://github.com/faircloth-lab/phyluce/issues/24#issuecomment-88003484 .

tandermann commented 9 years ago

The path does not have to be exactly the same, it depends on where the anaconda package is installed. You will have to check where anaconda is installed and set the path to the bin folder in the anaconda folder as your new working directory.

export PATH="/full/path/to/anaconda/bin:/full/path/to/anaconda/jar:$PATH:/full/path/to/user/local/bin"

Concerning the other line I suggested to add you also have to give there the correct path to anaconda: export CONDA_DEFAULT_ENV=/full/path/to/anaconda

The files only have to have the same form as in my example. What is vital is the 'underscore' in the name, so e.g. text_a1.fasta would work but not text1.fasta.

Best, Tobi

malivezey commented 9 years ago

Got it. Thanks

On Tue, Mar 31, 2015 at 9:12 AM, Tobias Hofmann notifications@github.com wrote:

The path does not have to be exactly the same, it depends on where the anaconda package is installed. You will have to check where anaconda is installed and set the path to the bin folder in the anaconda folder as your new working directory.

export PATH="/full/path/to/anaconda/bin:/full/path/to/anaconda/jar:$PATH:/full/path/to/user/local/bin"

Concerning the other line I suggested to add you also have to give there the correct path to anaconda: export CONDA_DEFAULT_ENV=/full/path/to/anaconda

The files only have to have the same form as in my example. What is vital is the 'underscore' in the name, so e.g. text_a1.fasta would work but not text1.fasta.

Best, Tobi

— Reply to this email directly or view it on GitHub https://github.com/faircloth-lab/phyluce/issues/24#issuecomment-88078855 .

malivezey commented 9 years ago

This issue is not closed. It has not really even been responsibly answered. Thank you Tobias for explaining the path issue. Your comments are an exact duplicate of what is in the online instructions but it does not seem to help the program at all. The program still does not work, the documentation is scant, and the support is zero.

brantfaircloth commented 9 years ago

Hi Martin,

While it is possible to re-open the issue, I assumed that your "got it" comment meant you had things working. @tobiashofmann88 has done a nice job of trying to help you get things working, and he's gone above and beyond what most people would do. Thank you @tobiashofmann88.

Please avoid comments having to do with the "responsibility" of answering a question - none of us are paid to produce software or write documentation or provide support to users. Most of us are running full research ships on top of making code available for others to modify and use for their own purposes.

As Tobias mentioned, it appears you have having $PATH problems with your installation. If you take a look at the documentation (http://phyluce.readthedocs.org/en/latest/installation.html), it outlines reasonably clearly how to get the code installed using the conda package manager. The trickiest part is adding the anaconda distribution to your $PATH - additional details are here:

http://phyluce.readthedocs.org/en/latest/installation.html#checking-your-path

Once you have that working, it would probably be best to get a handle on how the software works by following the tutorial, which I have recently moved into this "stable" part of the repository. That is available here:

http://phyluce.readthedocs.org/en/latest/tutorial-one.html

Best of luck with your work.

malivezey commented 9 years ago

I greatly appreciate both your and Tobias time and I am willing to make a contribution to your research grant if I can get this running in a couple of days. I very strongly believe in the long term value of your approach. I cannot understand why the idea of using UCEs and associated regions has not gained more traction. To me this is something any new student to the field is praying for everyday: low hanging fruit. You have done all the work. It simply needs to be applied now to an area that needs it very much. I believe it can make a big difference to issues of phylogeny in fungi, and please note, when I am successful, I have every intention of giving you both all of the credit you deserve. I am new, particularly to linux, so if it is coming across as dense, just pitch me another bone. I have been hacking for at least two months on this and I don't think I have raised much of a fuss.

hk20013106 commented 9 years ago

Hello Martin,

I wonder if you have figure out the issue? I got exactly the same error message as yours. Thanks!

Kai

brantfaircloth commented 9 years ago

hi Kai,

try installing git for your distribution to see if that fixes the error for the 1.5 version.

cheers, b