faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

phyluce_probe_strip_masked_loci_from_set error #182

Closed fanvanf closed 3 years ago

fanvanf commented 4 years ago

Hi Brant, I am trying to follow the phasing tutorial IV.when I run the phyluce_probe_strip_masked_loci_from_set script, there's an error shows . I realized that there might be something wrong with 2bit file format.I had to generate 2bit files using the faToTwoBit -long parameter because the genome size is more than 16GB .How can I solve this problem?

Traceback (most recent call last):
  File "/home/anaconda3/envs/phyluce/bin/phyluce_probe_strip_masked_loci_from_set", line 122, in <module>
    main()
  File "/home/anaconda3/envs/phyluce/bin/phyluce_probe_strip_masked_loci_from_set", line 90, in main
    tb = twobit.TwoBitFile(file(args.twobit))
  File "/home/anaconda3/envs/phyluce/lib/python2.7/site-packages/bx/seq/twobit.py", line 68, in __init__
    raise Exception("File is version '%d' but I only know about '%d'" % (self.version, TWOBIT_VERSION))
Exception: File is version '1' but I only know about '0'
brantfaircloth commented 4 years ago

good morning,

the problem appears to be that the format of the 2bit file you created with --long is not expected by the module that I'm using to read twobit files (bx-python). There may be a work around, but it probably involves using another twobit reader to read the file. You might be able to hack the code a little to use something like https://pythonhosted.org/twobitreader/twobitreader.html, which might work, although I have not tested.

fanvanf commented 4 years ago

good morning, It don't work well by using another twobit reader to reader the format of the 2bit file created with --long,such as twobitreader. 2bit file version in header should be 0( I recheck the .2bit format based on the specs at UCSC http://genome.ucsc.edu/FAQ/FAQformat#format7 ),but the 2bit file created with --long seem not be 0. Therefore , 2bit files using the faToTwoBit -long parameter seems to be unavailable 。 Can I split this genome into 2 parts, or filter out duplicate sequence?

python -m twobitreader wh_long.2bit
twobitreader.TwoBitFileError: Invalid 2-bit file. File version in header should be 0.
brantfaircloth commented 4 years ago

You may be able to split the genome file in two... but you would also have to split the bait set files in two so that the contigs in the split genome file are the same as those in the split bait set files.