Closed EmilyPhelps closed 5 years ago
So it turns out I had installed a different version of IPYRAD (I think 0.6.27) to try and get around the following error:
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /newhome/ep15438/miniconda2/lib/python2.7/site-packages/pysam/libchtslib.so)
but I had forgotten about it because I've been struggling with various issues for weeks.
So I updated it and I am currently getting the above error again without it letting me write any files at all.
I then searched through here and did
conda install pysam=0.10.0 -c bioconda
conda install ipyrad -c ipyrad -f
getting the error
File "/newhome/ep15438/miniconda2/lib/python2.7/site-packages/pysam/__init__.py", line 5, in <module>
from pysam.libchtslib import *
repeatedly. Then it just doesnt run anything else. So back to where I was 48 hours ago: With not a single step running! Have I cocked it up further? Help!
Try conda install -c ipyrad ipyrad -f
.
There is some weird dependency issue with the conda package rn that is causing this weird 0.6 rollback. Using -f
forces the most recent version.
As an alternative you can try the new v0.9 beta branch (much faster). You'll need the python 3 version of miniconda, since the new ipyrad is python 3. Once you install python3 with miniconda you can do: conda install -c bioconda -c ipyrad ipyrad
. I would recommend trying this, as we are actively moving folks to the v0.9 version, and will switch support away from the old branch at that time.
Thank you! I just did that and then tried to run IPYRAD and got the error message
(base)$ ipyrad -p params-ip_MG_test.txt -s 67 -r -f
loading Assembly: ip_MG_test
from saved path: ~/Potamonautes/ip_MG_test.json
Warning: The format of max_Ns_consens should now be a float and was set on load to 0.05
Warning: The format of max_Hs_consens should now be a float and was set on load to 0.05
Traceback (most recent call last):
File "/newhome/ep15438/miniconda3/bin/ipyrad", line 10, in <module>
sys.exit(main())
File "/newhome/ep15438/miniconda3/lib/python3.7/site-packages/ipyrad/__main__.py", line 617, in main
CLI()
File "/newhome/ep15438/miniconda3/lib/python3.7/site-packages/ipyrad/__main__.py", line 69, in __init__
self.get_assembly()
File "/newhome/ep15438/miniconda3/lib/python3.7/site-packages/ipyrad/__main__.py", line 368, in get_assembly
data.set_params(key, param)
File "/newhome/ep15438/miniconda3/lib/python3.7/site-packages/ipyrad/core/assembly.py", line 476, in set_params
setattr(self.params, param, newvalue)
File "/newhome/ep15438/miniconda3/lib/python3.7/site-packages/ipyrad/core/params.py", line 551, in max_Ns_consens
value = float(value)
ValueError: could not convert string to float: '5, 5'
My params file is the same as above.
The params file format is slightly different, easiest just to create a new one. You'll also have to roll back to step 3, as many of the internal file formats are quite different to support the massive performance boost of the new version.
So I have made a new params file
------- ipyrad params file (v.0.9.11)-------------------------------------------
ip_MG_test ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/newhome/ep15438/Potamonautes/ipyrad3 ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
## [3] [barcodes_path]: Location of barcodes file
./Test/*.fq.gz ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
## [6] [reference_sequence]: Location of reference sequence file
pairddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
5 ## [11] [mindepth_statistical]: Min depth for statistical base calling
3 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
0 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
25 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
* ## [27] [output_formats]: Output formats (see docs)
## [28] [pop_assign_file]: Path to population assignment file
## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3
Then ran to step 3 successfully but during step 4 got the error message tuple index out of range.
Step 4: Joint estimation of error rate and heterozygosity
[####################] 100% 0:00:05 | inferring [H, E]
Encountered an Error.
Message: IndexError: tuple index out of range
Use debug flag (-d) for full code traceback.
Sorry for all the comments!
the debug gave me
---------------------------------------------------------------------------IndexError Traceback (most recent call last)<string> in <module>
~/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/jointestimate.py in optim(data, sample)
407 try:
408 ## get array of all clusters data
--> 409 stacked = stackarray(data, sample)
410
411 ## get base frequencies
~/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/jointestimate.py in stackarray(data, sample)
337 try:
338 cutlens[0] = len(data.params.restriction_overhang[0])
--> 339 cutlens[1] = maxlen - len(data.params.restriction_overhang[1])
340 except TypeError:
341 pass
IndexError: tuple index out of range
Well, it looks like you haven't specified anything for the restriction_overhang parameter. Since you have pairddrad you need to put in both overhang sequences.
hello, thanks for all your help so far. Hopefully this is the last issue I have:
Step 7: Filtering and formatting output files
[####################] 100% 0:00:10 | applying filters
[####################] 100% 0:00:14 | building arrays
Encountered an Error.
Message: ValueError: not enough values to unpack (expected 2, got 1)
Parallel connection closed.
---------------------------------------------------------------------------ValueError Traceback (most recent call last)<string> in <module>
~/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/write_outputs.py in write_loci_and_alleles(data)
1604
1605 all1, all2 = splitalleles(seq)
-> 1606 aname, spacer = name.split(" ", 1)
1607 achunk.append(aname + "_0 " + spacer + all1)
1608 achunk.append(aname + "_1 " + spacer + all2)
ValueError: not enough values to unpack (expected 2, got 1)
My parameter file currently looks like this:
------- ipyrad params file (v.0.9.11)-------------------------------------------
ip_MG_test ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/newhome/ep15438/Potamonautes/ipyrad3 ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
## [3] [barcodes_path]: Location of barcodes file
./Test/*.fq.gz ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
## [6] [reference_sequence]: Location of reference sequence file
pairddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
CCGG, AATT ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
5 ## [11] [mindepth_statistical]: Min depth for statistical base calling
3 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
0 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
15 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2, 0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8, 8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
* ## [27] [output_formats]: Output formats (see docs)
## [28] [pop_assign_file]: Path to population assignment file
## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3
Thanks again for all your help, I really appreciate it!
Can I see the first few lines of your barcodes file? Or just show me 4 or 5 different representative sample names.
my sample names are
MG_Sample10_R1_.fq.gz
MG_Sample10_R2_.fq.gz
MG_Sample13_R1_.fq.gz
MG_Sample13_R2_.fq.gz
MG_Sample18_R1_.fq.gz
MG_Sample18_R2_.fq.gz
MG_Sample19_R1_.fq.gz
MG_Sample19_R2_.fq.gz
There are 50 samples- the numbering is not consistent as some samples were removed already.
This is what these files look like inside
?}K??8侮2
??@??????o1?p0@R?|U?=?|?J?D??????6???????7??ǿr?b??\??R?l?Ԛs??????!"k?U??{?o?????o????S߮?ռ??g??fۣ?{??}?????o?y{???u?p?
?k???v????\=???o??c???_????????f??=?K?e?????nO???_??SU??e?6M???y~???z?g??e?????C??~{?????F??~ھ?զ%?M?N??
{?M??RE??6??k?T,+}I???l?B?п?j???5"6??]]M?????H?K??G2??XĞ\?w??!鰥??XC버ɺa?=?OU'K2?(??5????Ҟ??L??cc]t??L??6?x?>?????0?Ҷ?m?mw?ݜm?aHz??;?GI?F????-?v)-?̠߄????ۂ??o[:?:?:?zkO fZ?Nl???W??K?֓/]Z?"};?p6??^?V????ve?ݗlW*B
??:?mN?߄????nP?D??,?П8N2?]??g??H???mA?23W??????m??j
K
?3?椆wLoZ_]??????#??i[w?DKQ?zr
?Y?]XZ]vf?t??wvO??l??)??:?^?z?}???8??-?
?I????}|L?g??R?Ȫ]??5??F?A?V?5?%t???_]????G#?m?M???F?_??~?%(?$?F?4??Vמ
When expanded the files look like this (this is MG_Sample3R2.fq)
@1_11101_11247_1004_2
CGGGAAGAAAGTTCTGTGACGGGGTAGACGCTTAAGTTGAATGTCCTGAAAACTTACCACTGGCGTAGGTAAAAGAATTGATCG
+
AA6AAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA
@1_11101_7843_1010_2
CGGTGTCTGAGAACGAGACACCA
Sorry if this is a really basic error!
Hiya -I was having another think about this today and I changed the [max_shared_Hs_locus] to having two numbers so from:
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
to:
0.5 , 0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
I got the error message
ValueError: could not convert string to float: '0.5, 0.5'
Could this be the issue? Is there a quick fix for this?
This parameter only accepts one value in the 0.9 branch.
Okay thanks- back to this ValueError: not enough values to unpack (expected 2, got 1) error :/
When you switched to v0.9 did you re-run step 6?
Yep, I re-ran the whole pipeline just in case.
Hm, well there's something unusual going on. Any way you can dropbox me the raw data and the barcodes file?
During the filtering/chunking phase, if all loci in a chunk do not pass min_samples_locus filtering then it creates an empty .loci file, which cascades down to this error in a very non-straightforward fashion. Chunks w/ no loci passing filtering maybe just shouldn't be written to .loci files?
Fixed: 2aebb35
Handle chunks with no loci passing filtering. Don't write out tmpfiles for .loci/.p/.npy in the tmpdir.
I know this has come up a few times before. I'm having a lot of trouble with IPYRAD recently although I used it back in March for the first time and it worked perfectly on my computer. I am now trying to run it on a linux server. I have installed/ updated it this week.
It all goes smoothly until step 7 then I am getting:
I have tried to reduce my min_sample _locus from 25 to 4 and it still isn't working- but previously I had it working at 26! I haven't bothered with the pop file since I saw this was causing issues.
Here is my parameters file
Let me know if I can give any more information!!