amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
398 stars 64 forks source link

constraint tree #200

Closed hjt1129 closed 1 day ago

hjt1129 commented 1 month ago

hi, I run the raxml-ng using SNP data, when i used four individuals (S1,S2,S3,S4) from one population as outgroup, they did not formed in one lineage. I used "--tree-constraint" to specify a constraint tree for these four individuals, the file content is "(S1,S2,S3,S4);", however, i comes the error "Segmentation fault (core dumped)". What's wrong with this? how can i take this four individuals into one lineage. The executive command is "raxml-ng --all --msa ./074GSC_12535_no39_tab.min4.min4.phy --outgroup S40 S41 S43 --model TVM+G4+ASC_LEWIS --tree pars{10} --bs-trees 1000 --tree-constraint constraint.tree"

amkozlov commented 1 month ago

Please post your .raxml.log file as well as input data, and I will have a look.

hjt1129 commented 4 weeks ago

hi, this is the log file and the input data, please check, thanks.

---- Replied Message ---- | From | Oleksiy @.> | | Date | 10/23/2024 23:39 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [amkozlov/raxml-ng] constraint tree (Issue #200) |

Please post your .raxml.log file as well as input data, and I will have a look.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

RAxML-NG v. 1.2.0 released on 09.05.2023 by The Exelixis Lab. Developed by: Alexey M. Kozlov and Alexandros Stamatakis. Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth, Julia Haag, Anastasis Togkousidis. Latest version: https://github.com/amkozlov/raxml-ng Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, 64 cores, 1007 GB RAM

RAxML-NG was called at 24-Oct-2024 04:01:16 as follows:

raxml-ng --all --msa ./074GSC_12535_no39_tab.min4.min4.phy --outgroup S40 S41 S43 --model TVM+G4+ASC_LEWIS --tree pars{10} --bs-trees 1000 --tree-constraint constraint.mld

Analysis options: run mode: ML tree search + bootstrapping (Felsenstein Bootstrap) start tree(s): parsimony (10) bootstrap replicates: parsimony (1000) topological constraint: constraint.mld (algorithm: NEW) outgroup taxa: S40 random seed: 1729742476 tip-inner: OFF pattern compression: ON per-rate scalers: OFF site repeats: ON logLH epsilon: general: 10.000000, brlen-triplet: 1000.000000 fast spr radius: AUTO spr subtree cutoff: 1.000000 fast CLV updates: ON branch lengths: proportional (ML estimate, algorithm: NR-FAST) SIMD kernels: AVX2 parallelization: coarse-grained (auto), PTHREADS (auto)

[00:00:00] Reading alignment from file: ./074GSC_12535_no39_tab.min4.min4.phy [00:00:00] Loaded alignment with 43 taxa and 9650 sites

Alignment comprises 1 partitions and 9620 patterns

Partition 0: noname Model: TVM+FO+G4m+ASC_LEWIS Alignment sites / patterns: 9650 / 9620 Gaps: 15.81 % Invariant sites: 0.00 %

NOTE: Binary MSA file created: ./074GSC_12535_no39_tab.min4.min4.phy.raxml.rba

[00:00:00] Loading constraint tree from: constraint.mld [00:00:00] Loaded non-comprehensive constraint tree with 3 taxa

Parallelization scheme autoconfig: 10 worker(s) x 6 thread(s)

[00:00:00] Generating 10 parsimony starting tree(s) with 43 taxa Parallel parsimony with 60 threads Parallel reduction/worker buffer size: 1 KB / 0 KB

[00:00:00] Data distribution: max. partitions/sites/weight per thread: 1 / 1604 / 25664 [00:00:00] Data distribution: max. searches per worker: 101

Starting ML tree search with 10 distinct starting trees

[00:00:11] [worker #3] ML tree search #4, logLikelihood: -120967.205838

amkozlov commented 4 weeks ago

1 please try the latest version: https://github.com/amkozlov/raxml-ng/releases/tag/1.2.2

  1. outgroup taxa names must be separated by comma, e.g. --outgroup S40,S41,S43
  2. if it still doesn't work, please attach 074GSC_12535_no39_tab.min4.min4.phy and constraint.mld files, and I'll have a look
hjt1129 commented 4 weeks ago

hi, it still doesn't work, attaches are the .log, .phy and .mld files. Please check. Thanks. ---- Replied Message ---- | From | Oleksiy @.> | | Date | 10/24/2024 17:16 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [amkozlov/raxml-ng] constraint tree (Issue #200) |

1 please try the latest version: https://github.com/amkozlov/raxml-ng/releases/tag/1.2.2

  1. outgroup taxa names must be separated by comma, e.g. --outgroup S40,S41,S43
  2. if it still doesn't work, please attach 074GSC_12535_no39_tab.min4.min4.phy and constraint.mld files, and I'll have a look

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

RAxML-NG v. 1.2.2-master released on 30.04.2024 by The Exelixis Lab. Developed by: Alexey M. Kozlov and Alexandros Stamatakis. Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth, Julia Haag, Anastasis Togkousidis. Latest version: https://github.com/amkozlov/raxml-ng Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, 64 cores, 1007 GB RAM

RAxML-NG was called at 24-Oct-2024 13:14:27 as follows:

/home/tonghaojie/software/raxml-ng1.2.2/raxml-ng --all --msa ./074GSC_12535_no39_tab.min4.min4.phy --outgroup S40,S41,S43 --model TVM+G4+ASC_LEWIS --tree pars{10} --bs-trees 1000 --tree-constraint constraint.mld

Analysis options: run mode: ML tree search + bootstrapping (Felsenstein Bootstrap) start tree(s): parsimony (10) bootstrap replicates: parsimony (1000) topological constraint: constraint.mld (algorithm: NEW) outgroup taxa: S40,S41,S43 random seed: 1729775667 tip-inner: OFF pattern compression: ON per-rate scalers: OFF site repeats: ON logLH epsilon: general: 10.000000, brlen-triplet: 1000.000000 fast spr radius: AUTO spr subtree cutoff: 1.000000 fast CLV updates: ON branch lengths: proportional (ML estimate, algorithm: NR-FAST) SIMD kernels: AVX2 parallelization: coarse-grained (auto), PTHREADS (auto)

[00:00:00] Reading alignment from file: ./074GSC_12535_no39_tab.min4.min4.phy [00:00:00] Loaded alignment with 43 taxa and 9650 sites

Alignment comprises 1 partitions and 9620 patterns

Partition 0: noname Model: TVM+FO+G4m+ASC_LEWIS Alignment sites / patterns: 9650 / 9620 Gaps: 15.81 % Invariant sites: 0.00 %

NOTE: Binary MSA file created: ./074GSC_12535_no39_tab.min4.min4.phy.raxml.rba

[00:00:00] Loading constraint tree from: constraint.mld [00:00:00] Loaded non-comprehensive constraint tree with 3 taxa

Parallelization scheme autoconfig: 10 worker(s) x 6 thread(s)

[00:00:00] Generating 10 parsimony starting tree(s) with 43 taxa Parallel parsimony with 60 threads Parallel reduction/worker buffer size: 1 KB / 0 KB

[00:00:00] Data distribution: max. partitions/sites/weight per thread: 1 / 1604 / 25664 [00:00:00] Data distribution: max. searches per worker: 101

Starting ML tree search with 10 distinct starting trees

[00:00:17] [worker #3] ML tree search #4, logLikelihood: -121002.277519

amkozlov commented 4 weeks ago

unfortunately I don't see the attachments

domivika commented 1 week ago

Hello,

I've run into a similar issue as the author of the post. I get the "Segmentation fault (core dumped)" error when running the command with 2 threads: raxml-ng --redo --msa results/fasta/family/67-of-79/aligned.fa --outgroup AAHYM041-16,AANIC173-10,AASFB349-10 --model GTR+G --tree-constraint results/fasta/family/67-of-79/remapped.tre --search --threads 2

and when running with 1 thread: raxml-ng --redo --msa results/fasta/family/67-of-79/aligned.fa --outgroup AAHYM041-16,AANIC173-10,AASFB349-10 --model GTR+G --tree-constraint results/fasta/family/67-of-79/remapped.tre --search --threads 1 I get the following error: free(): invalid pointer Aborted (core dumped)

Please see the attached zip file for logs, trees, and other relevant files

Thank you! raxml_files.zip

amkozlov commented 1 week ago

Hi Dominika,

thanks for reporting.

Your MSA has only 4 sequences, and your constraint tree has only 3 taxa. This means that only one topology is possible under the topological constraint.

Sure, raxml-ng should still process this trivial situation without failure, and I will fix it in the next version. But for the time being, maybe you could just detect and handle such edge cases in your pipeline?

domivika commented 1 week ago

Hello,

Thanks for your reply. In my pipeline, I simply skip files with insufficient number of aligned sequences (<4) for now. Also, if there is an error regarding the complex constraint tree like this:

[00:00:00] Loaded comprehensive constraint tree with 5 taxa ERROR: You provided a comprehensive, fully-resolved tree as a topological constraint. Since this is almost certainly not what you intended, RAxML-NG will now exit...

I simply copy the input tree to the output one for the sake of testing.

I think the problem lays somewhere else here. This is my snakemake rule for raxml tool:

rule run_raxml:
    input:
        alignment="results/fasta/family/{scatteritem}/aligned.fa",
        tree="results/fasta/family/{scatteritem}/remapped.tre"
    output:
        tree="results/fasta/family/{scatteritem}/aligned.fa.raxml.bestTree"
    params:
        model=config['model'],
        num_outgroups=config['outgroups']
    log: "logs/run_raxml/run_raxml-{scatteritem}.log"
    conda: "envs/raxml.yml"
    shell:
        """
        # Check the number of taxa in the alignment
        TAXON_COUNT=$(grep -c '>' {input.alignment})

        # Ensure there are at least 4 taxa
        if [ "$TAXON_COUNT" -lt 4 ]; then
            echo "Skipping RAxML-NG: alignment has only $TAXON_COUNT taxa, which is insufficient." > {log}
            echo "No tree generated due to insufficient taxa." > {output.tree}
            exit 0
        fi

        # Extract the outgroup names from the alignment file
        OG=$(grep '>' {input.alignment} | tail -{params.num_outgroups} | sed -e 's/>//' | tr '\n' ',')

        # Define a function to run RAxML-NG with error handling
        run_raxml () {{
            raxml-ng \
                --redo \
                --msa {input.alignment} \
                --outgroup $OG \
                --model {params.model} \
                --tree-constraint {input.tree} \
                --threads $1 \
                --search > {log} 2>&1
        }}

        # Check if the constraint tree file exists
        if [ -s {input.tree} ]; then
            run_raxml 20

            # Handle errors based on log output
            if grep -q "core oversubscription" {log}; then
                echo "Oversubscription detected; rerunning with fewer threads." >> {log}
                run_raxml 10
            elif grep -q "core dumped" {log}; then
                echo "Core dump detected, retrying with single-thread mode." >> {log}
                run_raxml 1
            elif grep -q "ERROR: You provided a comprehensive" {log}; then
                echo "Comprehensive constraint detected; copying input tree to output." >> {log}
                cp {input.tree} {output.tree}
            fi
        else
            # Run without constraint if not available
            raxml-ng \
                --redo \
                --msa {input.alignment} \
                --outgroup $OG \
                --model {params.model} \
                --threads 20 \
                --search > {log} 2>&1
        fi
        """

and I keep getting this core dumped error (only for few fasta files):

    environment: line 16: 4153872 Aborted                 (core dumped) raxml-ng --redo --msa results/fasta/family/67-of-79/aligned.fa --outgroup $OG --model GTR+G --tree-constraint results/fasta/family/67-of-79/remapped.tre --threads $1 --search > logs/run_raxml/run_raxml-67-of-79.log 2>&1

I tried digging into this issue, and apparently the program is trying to access the memory it doesn't have access to? I'm a bit confused, example: relevant reddit thread

Do you have any idea what is going on?

Thanks! Dominika

domivika commented 1 week ago

Hi,

It seems like the issue is also similar to #152

amkozlov commented 1 week ago

Well, as I said, the original problem that leads to the core dump has to be fixed inside raxml-ng.

But as a quick workaround, I suggest you simply disable tree constraint (and probably also outgroup) if alignment has 4 taxa, since it makes little sense.

Maybe you could even skip 4-taxa MSAs altogether. Please note that there are only 2 possible alternative tree topologies with 4 taxa.

domivika commented 1 week ago

Okay, I understand! Will do as you suggest. Thanks a lot for your quick answer. I'm looking forward to the next version!

Dominika

amkozlov commented 1 day ago

Fixed in the dev branch.