faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

internal trimming - no such file or directory #279

Closed crcardenas closed 1 year ago

crcardenas commented 1 year ago

I have had some issues with names previously, trying to concatenate multiple datasets to find UCE loci, and had mostly overcome that. However, I am having an issue with phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed

Here is my command, pretty standard:

phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed \
    --alignments mafft-nexus-internal-trimmed \
    --output mafft-nexus-internal-trimmed-gblocks \
    --cores 8 \
    --log log

It runs until it gets to a particular uce (uce-223111.fasta)... (I abbreviated the dots here).

2022-07-26 10:28:09,778 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO -  Starting phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed
2022-07-26 10:28:09,779 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Version: 1.7.1
2022-07-26 10:28:09,779 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Commit: None
2022-07-26 10:28:09,779 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --alignments: /data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/mafft-nexus-internal-trimmed
2022-07-26 10:28:09,779 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --b1: 0.5
2022-07-26 10:28:09,779 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --b2: 0.85
2022-07-26 10:28:09,779 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --b3: 8
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --b4: 10
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --cores: 8
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --input_format: fasta
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --log_path: /data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/log
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --output: /data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/mafft-nexus-internal-trimmed-gblocks
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --output_format: nexus
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Argument --verbosity: INFO
2022-07-26 10:28:09,780 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Getting aligned sequences for trimming
2022-07-26 10:28:09,822 - phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed - INFO - Alignment trimming begins.
...multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/bin/phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed", line 141, in get_and_trim_alignments
    with open(trimmed_aln_file, "rU") as trimmed_aln:
FileNotFoundError: [Errno 2] No such file or directory: '/data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/mafft-nexus-internal-trimmed/uce-223111.fasta-gb'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/bin/phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed", line 224, in <module>
    main()
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/bin/phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed", line 208, in main
    alignments = pool.map(get_and_trim_alignments, params)
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: '/data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/mafft-nexus-internal-trimmed/uce-223111.fasta-gb'

I first tried building a fresh environment (with --verbosity CRITICAL), but I get the same error:

....multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/bin/phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed", line 141, in get_and_trim_alignments
    with open(trimmed_aln_file, "rU") as trimmed_aln:
FileNotFoundError: [Errno 2] No such file or directory: '/data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/mafft-nexus-internal-trimmed/uce-223111.fasta-gb'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/bin/phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed", line 224, in <module>
    main()
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/bin/phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed", line 208, in main
    alignments = pool.map(get_and_trim_alignments, params)
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/cody/.conda/envs/phyluce-1.7.1_py3.6/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: '/data/work/Calosoma_phylo/phylogeny/subset1/taxon-sets/all/mafft-nexus-internal-trimmed/uce-223111.fasta-gb'

Next, I tried running gblocks on the fasta causing the issue and another random one.

$ Gblocks uce-223111.fasta -t DNA -b1=0.5 -b2=0.85 -b3=8 -b4=8 -b5=h -p=n

Sequence name too long

Execution terminated

and it was successful with the other fasta file.

$ Gblocks uce-48.fasta -t DNA -b1=0.5 -b2=0.85 -b3=8 -b4=8 -b5=h -p=n

152 sequences and 7904 positions in the first alignment file:
uce-48.fasta

WARNING: Parameter -b1 not properly entered.
The minimum number of sequences for a conserved position must be bigger than or equal to 77 (half the number of sequences + 1)

WARNING: minimum number of sequences for a flank position set to the minimum possible value.
uce-48.fasta
Original alignment: 7904 positions
Gblocks alignment:  599 positions (7 %) in 22 selected block(s)

My next strategy is move the problematic fasta out of the directory it lives, but it found a new problematic file. Copying the file to test the run

$ Gblocks uce-138187.fasta -t DNA -b1=0.5 -b2=0.85 -b3=8 -b4=8 -b5=h -p=n

Sequence name too long

Execution terminated

you can see a part of the aforementioned naming issues here in the names of the first 20 sequences:

uce-22311.fasta

>uce-223111_Adalia_bipunctata_GCA_910592335 |uce-223111
>_R_uce-223111_Agabetes_acuductus_SRR10334071 |uce-223111
>_R_uce-223111_Aglymbus_sp_SRR10334049 |uce-223111
>uce-223111_Agrypnus_murinus_GCA_929113105 |uce-223111
>uce-223111_Amphizoa_insolens_SRR8518617 |uce-223111
>_R_uce-223111_Brachinus_cyanipennis_SRR10334068 |uce-223111
>_R_uce-223111_Broscus_cephalotes_SRR12339069 |uce-223111
>_R_uce-223111_Brychius_pacificus_SRR10334074 |uce-223111
>_R_uce-223111_Calophaena_bicincta_SRR12339143 |uce-223111
>uce-223111_Calosoma_granatense_GCA_022063505 |uce-223111
>_R_uce-223111_Calosoma_inquisitor_DRR295716_spades |uce-223111
>_R_uce-223111_Camedula_marginale_DRR295725_spades |uce-223111
>uce-223111_Cantharis_rustica_GCA_911387805 |uce-223111
>_R_uce-223111_Canthydrus_sp_SRR12339142 |uce-223111
>_R_uce-223111_Carabus_amplipennis_DRR295686_spades |uce-223111
>_R_uce-223111_Carabus_asperatus_DRR295671_spades |uce-223111
>_R_uce-223111_Carabus_coriaceus_DRR295699_spades |uce-223111
>_R_uce-223111_Carabus_crassesculptus_DRR295697_spades |uce-223111
>_R_uce-223111_Carabus_creutzeri_DRR295680_spades |uce-223111
>_R_uce-223111_Carabus_exiguus_DRR295678_spades |uce-223111

uce-138187.fasta

>uce-138187_Adalia_bipunctata_GCA_910592335 |uce-138187
>_R_uce-138187_Adelotopus_paroensis_SRR12339051 |uce-138187
>_R_uce-138187_Aglymbus_sp_SRR10334049 |uce-138187
>_R_uce-138187_Agrypnus_murinus_GCA_929113105 |uce-138187
>_R_uce-138187_Amphizoa_insolens_SRR5930489 |uce-138187
>_R_uce-138187_Anomala_sp_SRR2083625 |uce-138187
>_R_uce-138187_Aspidytes_niobe_SRR5889386 |uce-138187
>_R_uce-138187_Australodrepa_schayeri_DRR295714_spades |uce-138187
>_R_uce-138187_Bembidion_corgenoma_SRR8801541 |uce-138187
>_R_uce-138187_Brachinus_cyanipennis_SRR10334068 |uce-138187
>_R_uce-138187_Broscus_cephalotes_SRR12339069 |uce-138187
>_R_uce-138187_Brychius_pacificus_SRR10334074 |uce-138187
>_R_uce-138187_Calathus_sp_SRR12339144 |uce-138187
>_R_uce-138187_Calophaena_bicincta_SRR12339143 |uce-138187
>_R_uce-138187_Calosoma_granatense_GCA_022063505 |uce-138187
>_R_uce-138187_Calosoma_inquisitor_DRR295716_spades |uce-138187
>_R_uce-138187_Cantharis_rustica_GCA_911387805 |uce-138187
>_R_uce-138187_Canthydrus_sp_SRR12339142 |uce-138187
>_R_uce-138187_Carabomorphus_brachycerum_DRR295721_spades |uce-138187
>_R_uce-138187_Carabophanus_gestroi_DRR295720_spades |uce-138187

It seems there is a limit to the length of names and fails to produce the *fasta-gb file.

Either way, I started to remove more fastas that caused this issue, with more of the same same issue:

$ Gblocks uce-155796.fasta -t DNA -b1=0.5 -b2=0.85 -b3=8 -b4=8 -b5=h -p=n

Sequence name too long

Execution terminated

uce-155796.fasta

>uce-155796_Adalia_bipunctata_GCA_910592335 |uce-155796
>_R_uce-155796_Agabetes_acuductus_SRR10334071 |uce-155796
>_R_uce-155796_Agabus_undulatus_SRR12339124 |uce-155796
>_R_uce-155796_Aglymbus_sp_SRR10334049 |uce-155796
>_R_uce-155796_Agrypnus_murinus_GCA_929113105 |uce-155796
>_R_uce-155796_Algophilus_lathridioides_SRR12339113 |uce-155796
>_R_uce-155796_Amphizoa_insolens_SRR5930489 |uce-155796
>_R_uce-155796_Amphizoa_insolens_SRR8518617 |uce-155796
>_R_uce-155796_Anodocheilus_exiguus_SRR10334057 |uce-155796
>_R_uce-155796_Anomala_sp_SRR2083625 |uce-155796
>_R_uce-155796_Apoderus_coryli_GCA_911728435 |uce-155796
>_R_uce-155796_Aspidytes_niobe_SRR5889386 |uce-155796
>_R_uce-155796_Australodrepa_schayeri_DRR295714_spades |uce-155796
>_R_uce-155796_Batrachomatus_nannup_SRR5892099 |uce-155796
>_R_uce-155796_Bembidion_corgenoma_SRR8801541 |uce-155796
>_R_uce-155796_Brachinus_cyanipennis_SRR10334068 |uce-155796
>_R_uce-155796_Brychius_pacificus_SRR10334074 |uce-155796
>_R_uce-155796_Callistenia_subaeneum_DRR295727_spades |uce-155796
>_R_uce-155796_Calosoma_frigidum_SRR2083640 |uce-155796
>_R_uce-155796_Calosoma_granatense_GCA_022063505 |uce-155796

I can keep going, but thatwill be an effort in futility and wasted time as I keep finding more.

I'd appreciate any help!

While I wait I am going to try renaming the headers (removing "_spades" at least).

Two things to note that may or may not be relevant. phyluce_align_seqcap_align did not have any issues with the file names and ran fine with this command:

phyluce_align_seqcap_align \
    --input all-taxa-incomplete.fasta \
    --output mafft-nexus-internal-trimmed \
    --taxa 324 \
    --aligner mafft \
    --cores 8 \
    --incomplete-matrix \
    --output-format fasta \
    --no-trim \
    --log-path log

I will note that using "gblocks" doesnt call the program, and only Gblocks does. Even though its listed in my environment as gblocks (both environments are identical).

# packages in environment at /home/cody/.conda/envs/phyluce-1.7.1:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abyss                     1.5.2               boost1.61_5    bioconda
alsa-lib                  1.2.3                h516909a_0    conda-forge
amply                     0.1.4                      py_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
bcftools                  1.11                 h7c999a4_0    bioconda
bedtools                  2.30.0               hc088bd4_0    bioconda
biopython                 1.78             py36h8f6f2f9_2    conda-forge
brotlipy                  0.7.0           py36h8f6f2f9_1001    conda-forge
bwa                       0.7.17               hed695b0_7    bioconda
bx-python                 0.8.9            py36h5e0341f_2    bioconda
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h7f98852_1    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
cairo                     1.16.0            h7979940_1007    conda-forge
certifi                   2020.12.5        py36h5fab9bb_1    conda-forge
cffi                      1.14.5           py36hc120d54_0    conda-forge
chardet                   4.0.0            py36h5fab9bb_1    conda-forge
coincbc                   2.10.5               hcee13e7_1    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
commonmark                0.9.1                      py_0    conda-forge
configargparse            1.3                pyhd8ed1ab_0    conda-forge
cryptography              3.4.6            py36hb60f036_0    conda-forge
dataclasses               0.7                pyhe4b4509_6    conda-forge
datrie                    0.8.2            py36h8c4c3a4_1    conda-forge
decorator                 4.4.2                      py_0    conda-forge
dendropy                  4.5.2              pyh3252c3a_0    bioconda
docutils                  0.16             py36h5fab9bb_3    conda-forge
fontconfig                2.13.1            hba837de_1004    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
future                    0.18.2           py36h5fab9bb_3    conda-forge
gblocks                   0.91b                         1    bioconda
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
gitdb                     4.0.5              pyhd8ed1ab_1    conda-forge
gitpython                 3.1.14             pyhd8ed1ab_0    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gsl                       2.6                  he838d99_2    conda-forge
harfbuzz                  2.7.4                h5cf4720_0    conda-forge
htslib                    1.11                 hd3b49d5_2    bioconda
icu                       68.1                 h58526e2_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
illumiprocessor           2.10                       py_0    faircloth-lab/label/phyluce
importlib-metadata        3.7.2            py36h5fab9bb_0    conda-forge
importlib_metadata        3.7.2                hd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
jsonschema                3.2.0              pyhd8ed1ab_3    conda-forge
jupyter_core              4.7.1            py36h5fab9bb_0    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lastz                     1.0.4                h516909a_4    bioconda
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libcurl                   7.75.0               hc4aaa36_0    conda-forge
libdeflate                1.7                  h7f98852_5    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libglib                   2.66.7               h3e27bee_1    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libidn2                   2.3.0                h516909a_0    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libssh2                   1.9.0                ha56f1ee_6    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libtiff                   4.2.0                hdc55705_0    conda-forge
libunistring              0.9.10               h14c3975_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.0                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxml2                   2.9.10               h72842e0_3    conda-forge
llvm-meta                 7.0.0                         0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mafft                     7.475                h516909a_0    bioconda
make                      4.3                  hd18ef5c_1    conda-forge
more-itertools            8.7.0              pyhd8ed1ab_0    conda-forge
mpi                       1.0                     openmpi    conda-forge
muscle                    3.8.1551             hc9558a2_5    bioconda
nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.19.5           py36h2aa4a07_1    conda-forge
openjdk                   11.0.8               hacce0ff_0    conda-forge
openmp                    7.0.0                h2d50403_0    conda-forge
openmpi                   4.0.5                h9b22176_4    conda-forge
openssl                   1.1.1j               h7f98852_0    conda-forge
packaging                 20.9               pyh44b312d_0    conda-forge
pandas                    1.1.5            py36h284efc9_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
perl                      5.32.0               h36c2ea0_0    conda-forge
phyluce                   1.7.1                    py36_0    faircloth-lab/label/phyluce
pilon                     1.23                          2    bioconda
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pluggy                    0.13.1           py36h5fab9bb_4    conda-forge
psutil                    5.8.0            py36h8f6f2f9_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pulp                      2.4              py36h5fab9bb_0    conda-forge
py                        1.10.0             pyhd3deb0d_0    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pygments                  2.8.1              pyhd8ed1ab_0    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyrsistent                0.17.3           py36h8f6f2f9_2    conda-forge
pysocks                   1.7.1            py36h5fab9bb_3    conda-forge
pytest                    6.2.2            py36h5fab9bb_0    conda-forge
python                    3.6.13          hffdb5ce_0_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-lzo                1.12            py36hbaba66d_1003    conda-forge
python_abi                3.6                     1_cp36m    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1            py36h8f6f2f9_0    conda-forge
ratelimiter               1.2.0                   py_1002    conda-forge
raxml-ng                  1.0.1                h7447c1b_0    bioconda
readline                  8.0                  he28a2e2_2    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
rich                      9.2.0            py36h5fab9bb_0    conda-forge
samtools                  1.11                 h6270b1f_0    bioconda
seqtk                     1.3                  hed695b0_2    bioconda
setuptools                49.6.0           py36h5fab9bb_3    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
smmap                     3.0.5              pyh44b312d_0    conda-forge
snakemake-minimal         5.32.2                     py_0    bioconda
spades                    3.14.1               h2d02072_1    bioconda
sqlite                    3.34.0               h74cdb3f_0    conda-forge
tk                        8.6.10               h21135ba_1    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
toposort                  1.6                pyhd8ed1ab_0    conda-forge
traitlets                 4.3.3            py36h9f0ad1d_1    conda-forge
trimal                    1.4.1                hc9558a2_4    bioconda
trimmomatic               0.39                          1    bioconda
typing_extensions         3.7.4.3                    py_0    conda-forge
urllib3                   1.26.3             pyhd8ed1ab_0    conda-forge
velvet                    1.2.10               hed695b0_3    bioconda
wget                      1.20.1               h22169c7_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
wrapt                     1.12.1           py36h8f6f2f9_3    conda-forge
xorg-fixesproto           5.0               h14c3975_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.6.12               h516909a_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
xorg-libxi                1.7.10               h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxtst              1.2.3             h516909a_1002    conda-forge
xorg-recordproto          1.14.2            h516909a_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zipp                      3.4.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.9                ha95c52a_0    conda-forge
brantfaircloth commented 1 year ago

These are limitations of Gblocks, as far as I can tell. And, because Gblocks source code is not open, we cannot make a change to fix. That said, there are some workarounds.

First, you can strip the uce names from all of your loci (phyluce_align_remove_locus_name_from_files) - that will remove all the locus name characters from each line. To remove the spades bit you could use sed or similar. You can also filter your loci to ensure that the final set you are trimming contain the correct number of taxa using phyluce_align_get_only_loci_with_min_taxa.

If you continue to have issues with taxon name length, you could (1) use another trimming algorithm like trimAl (also implemented in phyluce) which I think will take long names or (2) shorten your taxon names using phyluce_align_convert_one_align_to_another which has an automated --shorten-names parameter, then run the shortened files back through Gblocks.

crcardenas commented 1 year ago

Thats rather unfortunate that gblocks behaves that way. Thanks for the advice though!

However, I used:

 phyluce_align_remove_locus_name_from_files \
--alignments mafft-nexus-internal-trimmed/  \
--output testout/  \
--input-format fasta \
--output-format fasta \
--cores 4

but the files got converted in an unexpected way

>Agabetes_acuductus_SRR10334071 uce-235988_Agabetes_acuductus_SRR10334071 |uce-235988
....
>Adalia_bipunctata_GCA_910592335 uce-99_Adalia_bipunctata_GCA_910592335 |uce-99

I solved it with a pretty straightforward sed command though.

for i in ./testout/*.fasta; do sed -i '/^>/ s/ .*//' $i; done

I then ran the code I had an issue with and this was solved.

brantfaircloth commented 1 year ago

Glad you got it working.