evotools / CattleGraphGenomePaper

Set of script for the paper on the cattle graph genome
13 stars 1 forks source link

Help about non-reference sequence detection #1

Closed Johnsonzcode closed 1 year ago

Johnsonzcode commented 1 year ago

Dear @npch @prenderj @RenzoTale88 @prasundutta87 Your code shows us a clear way to constrcut a pangenome. When I came to detect non-reference sequence, I can't find a help documentation. I don't understand the parameters in CattleGraphGenomePaper/detectSequences/nf-GraphSeq/conf/params.config. Could you please write a help documemtation for us or explain the parameters for us ?

Thank you in adcance!

Best Johnson

RenzoTale88 commented 1 year ago

@Johnsonzcode I've added some more detail to the workflow inputs. Let me know if you have any more questions on this.

Andrea

Johnsonzcode commented 1 year ago

@RenzoTale88 Apreciate a lot! I need some information indeed. Should I use the pan genome vg file from this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/cactus or this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/wgcactus to detect non-reference sequence? If I should use the vg file from https://github.com/evotools/CattleGraphGenomePaper/tree/master/cactus, how do I merge the vg file from different chromosomes? From paper, the chromosome by chromosome pan genome was build for VG5 with this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/cactus, and it seems non reference sequence was detected from the pan genome generated from this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/wgcactus. Is that right ? The chromosome by chromosome pan genome has been generated. but I use chrN for all assemlies. Does the building of pan genome must follow the name convention as GENOME.SEQUENCE ? Maybe I should build another one.

Look forward to your reply. Johnson

RenzoTale88 commented 1 year ago

@Johnsonzcode

  1. use the graph from here
  2. The VG5 for the realignment is indeed the chromosome by chromosome one, but for the novel sequence we used the wgcactus
  3. I recommend following the same naming convention. I think it would work with other naming structures as long as it univoquely refer to the reference, but i can't guarantee it
Johnsonzcode commented 1 year ago

@RenzoTale88 After building the VG5, the SVs and small varients should be added into VG5. If use vg software to do so, the chromosome by chromosome graph should be merged into one vg file. How to do that?

RenzoTale88 commented 1 year ago

@Johnsonzcode at the time, it was possible simply by concatenating the multiple VG files with cat (see here) and then run vg ids -j. Since recent versions I believe the recommended way is to use vg combine.

Johnsonzcode commented 1 year ago

@RenzoTale88 Apreciate it so much.

Johnsonzcode commented 1 year ago

@RenzoTale88

  1. About the input of nf-GraphSeq, if there is no short contigs in assembly, i.e. short contigs were filtered, should I just privide a empty file for --contigs?
  2. How to caculate the repetitiveness? I've tried bedtools nuc and seqkit fx2tab but both are inappropriate. Is there any recommendations ?
  3. If I don't have proteins, can I just leave it blank file?
RenzoTale88 commented 1 year ago

@Johnsonzcode

  1. Yes, it should work. Let me know if it doesn't though and I'll patch it
  2. You can get the numbers by using the script here
Johnsonzcode commented 1 year ago

@RenzoTale88 The pipeline works errors.

N E X T F L O W  ~  version 22.10.1
Launching `nf-GraphSeq/main.nf` [peaceful_mayer] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_align.vg
Reference genome           : xxx
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (10)
[33/ad3885] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[8d/6b2f68] process > non_ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔
[7a/dd0004] process > ref_nodes (non_ref_nodes)     [100%] 3 of 3, failed: 3, retries: 2 ✔
[-        ] process > add_support_vector            -
[49/371926] process > get_gaps (get_gaps)           [100%] 3 of 3, failed: 3, retries: 2 ✔
[-        ] process > add_gap_info                  -
[-        ] process > combine_regions               -
[-        ] process > label_regions                 -
[-        ] process > get_repetitiveness            -
[-        ] process > cleanup                       -
[-        ] process > bedToFasta                    -
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
[ee/889161] NOTE: Process `get_gaps (get_gaps)` terminated with an error exit status (127) -- Execution is retried (1)
[3c/7ae61a] NOTE: Process `get_gaps (get_gaps)` terminated with an error exit status (127) -- Execution is retried (2)
[67/0493c3] NOTE: Process `ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (1)
[49/371926] NOTE: Process `get_gaps (get_gaps)` terminated with an error exit status (127) -- Error is ignored
[4c/200496] NOTE: Process `non_ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (1)
[98/495b0a] NOTE: Process `ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (2)
[4d/e7e193] NOTE: Process `non_ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (2)
[7a/dd0004] NOTE: Process `ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Error is ignored
[8d/6b2f68] NOTE: Process `non_ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Error is ignored

How could I debug to find reason? Or Do you have some ideas? Thank you!

RenzoTale88 commented 1 year ago

Hi could you share the file .nextflow.log in the working directory, and the .command.err and .command.log in work/8d/6b2f68*/ ? Are you running the workflow in anaconda profile? (-profile conda)


From: johnsonz @.> Sent: Monday, January 2, 2023 4:22:18 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

@RenzoTale88https://github.com/RenzoTale88 The pipeline works errors.

N E X T F L O W ~ version 22.10.1

Launching nf-GraphSeq/main.nf [peaceful_mayer] DSL2 - revision: 77e3a1fa1e

Non-ref sequence v 0.5a

================================

PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_align.vg

Reference genome : xxx

Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa

Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt

Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt

Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt

Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/pep.all.fa.gz

Flanking regions : 1000

Gaps flanking regions : 1000

Novelty cutoff (ratio) : 0.95

executor > local (10)

[33/ad3885] process > make_diamond_db (makedb) [100%] 1 of 1 ✔

[8d/6b2f68] process > non_ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔

[7a/dd0004] process > ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔

[- ] process > add_support_vector -

[49/371926] process > get_gaps (get_gaps) [100%] 3 of 3, failed: 3, retries: 2 ✔

[- ] process > add_gap_info -

[- ] process > combine_regions -

[- ] process > label_regions -

[- ] process > get_repetitiveness -

[- ] process > cleanup -

[- ] process > bedToFasta -

[- ] process > selfalign -

[- ] process > simplify -

[- ] process > getfasta -

[- ] process > getfasta_flanked -

[- ] process > blastx -

[- ] process > abinitio -

[- ] process > filter_abinitio -

[- ] process > abinitio_flank -

[- ] process > filter_abinitio_flank -

[- ] process > consolidate -

[ee/889161] NOTE: Process get_gaps (get_gaps) terminated with an error exit status (127) -- Execution is retried (1)

[3c/7ae61a] NOTE: Process get_gaps (get_gaps) terminated with an error exit status (127) -- Execution is retried (2)

[67/0493c3] NOTE: Process ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (1)

[49/371926] NOTE: Process get_gaps (get_gaps) terminated with an error exit status (127) -- Error is ignored

[4c/200496] NOTE: Process non_ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (1)

[98/495b0a] NOTE: Process ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (2)

[4d/e7e193] NOTE: Process non_ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (2)

[7a/dd0004] NOTE: Process ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Error is ignored

[8d/6b2f68] NOTE: Process non_ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Error is ignored

How could I debug to find reason? Or Do you have some ideas? Thank you!

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1368648653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKCJUYPB7CSB2PGIABLWQJJXVANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

There is conf:

params {
    pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg'
    genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa'
    contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt'
    scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt'
    repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt'
    reference = 'CAU_Wild'
    proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz'
    flanks = 1000
    gap_flanks = 1000
    novelty_cutoff = 0.95
    outfolder = 'outdir'
    frc = false
    help = false
    publish_dir_mode = 'copy'
    extra_cluster_options = ''
}params {
    pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg'
    genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa'
    contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt'
    scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt'
    repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt'
    reference = 'CAU_Wild'
    proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz'
    flanks = 1000
    gap_flanks = 1000
    novelty_cutoff = 0.95
    outfolder = 'outdir'
    frc = false
    help = false
    publish_dir_mode = 'copy'
    extra_cluster_options = ''
}

And the command line:

nextflow run nf-GraphSeq/main.nf
RenzoTale88 commented 1 year ago

Try to run it with nextflow run nf-GraphSeq/main.nf<http://main.nf/> -profile conda and see if it works.


From: johnsonz @.> Sent: Monday, January 2, 2023 12:41:25 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

There is conf:

params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' }params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' }

And the command line:

nextflow run nf-GraphSeq/main.nf

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1368914436, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKFS3QJQVGGHQ4HS6U3WQLEHLANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

This is the .nextflow.log, .command.err and .command.log.


Jan-02 12:12:26.741 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-GraphSeq/main.nf
Jan-02 12:12:26.855 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 22.10.1
Jan-02 12:12:26.886 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/storage-01/poultrylab1/.nextflow/plugins; core-plugins: nf-amazon@1.11.0,nf-azure@0.14.2,nf-codecommit@0.1.2,nf-console@1.0.4,nf-ga4gh@1.0.4,nf-google@1.4.4,nf-tower@1.5.5,nf-wave@0.5.2
Jan-02 12:12:26.900 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jan-02 12:12:26.901 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jan-02 12:12:26.907 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Jan-02 12:12:26.922 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Jan-02 12:12:26.945 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/nextflow.config
Jan-02 12:12:26.947 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/nextflow.config
Jan-02 12:12:26.970 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Jan-02 12:12:27.870 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
Jan-02 12:12:27.889 [main] INFO  nextflow.cli.CmdRun - Launching `nf-GraphSeq/main.nf` [peaceful_mayer] DSL2 - revision: 77e3a1fa1e
Jan-02 12:12:27.890 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Jan-02 12:12:27.890 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
Jan-02 12:12:27.900 [main] DEBUG nextflow.secret.LocalSecretsProvider - Secrets store: /storage-01/poultrylab1/.nextflow/secrets/store.json
Jan-02 12:12:27.904 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@3468ee6e] - activable => nextflow.secret.LocalSecretsProvider@3468ee6e
Jan-02 12:12:27.967 [main] DEBUG nextflow.Session - Session UUID: cfa054dd-5654-4180-aec4-e41dd42e351b
Jan-02 12:12:27.968 [main] DEBUG nextflow.Session - Run name: peaceful_mayer
Jan-02 12:12:27.968 [main] DEBUG nextflow.Session - Executor pool size: 208
Jan-02 12:12:27.979 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=624; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jan-02 12:12:28.006 [main] DEBUG nextflow.cli.CmdRun -
Version: 22.10.1 build 5828
Created: 27-10-2022 16:58 UTC (28-10-2022 00:58 CDT)
System: Linux 3.10.0-1127.19.1.el7.x86_64
Runtime: Groovy 3.0.13 on OpenJDK 64-Bit Server VM 11.0.13+7-b1751.21
Encoding: UTF-8 (UTF-8)
Process: 298667@pbsnode01 [202.112.170.234]
CPUs: 208 - Mem: 1007.1 GB (439.6 GB) - Swap: 64 GB (62 GB)
Jan-02 12:12:28.028 [main] DEBUG nextflow.Session - Work-dir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work [xfs]
Jan-02 12:12:28.048 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Jan-02 12:12:28.060 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Jan-02 12:12:28.089 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Jan-02 12:12:28.101 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 209; maxThreads: 1000
Jan-02 12:12:28.234 [main] DEBUG nextflow.Session - Session start
Jan-02 12:12:28.526 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jan-02 12:12:28.547 [main] INFO  nextflow.Nextflow - Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

Jan-02 12:12:29.269 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name make_diamond_db Jan-02 12:12:29.290 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.290 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.299 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local Jan-02 12:12:29.308 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=208; memory=1007.1 GB; capacity=208; pollInterval=100ms; dumpInterval=5m Jan-02 12:12:29.473 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name non_ref_nodes Jan-02 12:12:29.475 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.475 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.485 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name ref_nodes Jan-02 12:12:29.486 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.486 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.493 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name add_support_vector Jan-02 12:12:29.495 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.495 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.502 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name get_gaps Jan-02 12:12:29.503 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.503 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.510 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name add_gap_info Jan-02 12:12:29.511 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.511 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.518 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name combine_regions Jan-02 12:12:29.519 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.519 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.524 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small matches labels small for process with name label_regions Jan-02 12:12:29.525 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.526 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.562 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name get_repetitiveness Jan-02 12:12:29.563 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.563 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.568 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name cleanup Jan-02 12:12:29.569 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.569 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.582 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small matches labels small for process with name bedToFasta Jan-02 12:12:29.583 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.583 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.594 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name selfalign Jan-02 12:12:29.594 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.594 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.598 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium matches labels medium for process with name simplify Jan-02 12:12:29.599 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.599 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.603 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small matches labels small for process with name getfasta Jan-02 12:12:29.604 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.604 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.608 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small matches labels small for process with name getfasta_flanked Jan-02 12:12:29.609 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.609 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.616 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name blastx Jan-02 12:12:29.617 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.617 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.624 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name abinitio Jan-02 12:12:29.625 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.625 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.629 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name filter_abinitio Jan-02 12:12:29.629 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.630 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.633 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name abinitio_flank Jan-02 12:12:29.634 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.634 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.638 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name filter_abinitio_flank Jan-02 12:12:29.639 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.639 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.643 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large matches labels large for process with name consolidate Jan-02 12:12:29.643 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local Jan-02 12:12:29.643 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Jan-02 12:12:29.646 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: abinitio_flank, filter_abinitio, getfasta, consolidate, add_support_vector, combine_regions, frc_filter, get_gaps, add_gap_info, ref_nodes, non_ref_nodes, get_repetitiveness, getfasta_flanked, label_regions, filter_abinitio_flank, selfalign, cleanup, make_diamond_db, abinitio, bedToFasta, blastx, simplify Jan-02 12:12:29.646 [main] DEBUG nextflow.Session - Igniting dataflow network (21) Jan-02 12:12:29.647 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > make_diamond_db Jan-02 12:12:29.648 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > non_ref_nodes Jan-02 12:12:29.649 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > ref_nodes Jan-02 12:12:29.649 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > add_support_vector Jan-02 12:12:29.651 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > get_gaps Jan-02 12:12:29.655 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > add_gap_info Jan-02 12:12:29.656 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > combine_regions Jan-02 12:12:29.659 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > label_regions Jan-02 12:12:29.663 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > get_repetitiveness Jan-02 12:12:29.665 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > cleanup Jan-02 12:12:29.665 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > bedToFasta Jan-02 12:12:29.666 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > selfalign Jan-02 12:12:29.667 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > simplify Jan-02 12:12:29.667 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > getfasta Jan-02 12:12:29.669 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > getfasta_flanked Jan-02 12:12:29.671 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > blastx Jan-02 12:12:29.671 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > abinitio Jan-02 12:12:29.672 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > filter_abinitio Jan-02 12:12:29.674 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > abinitio_flank Jan-02 12:12:29.674 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > filter_abinitio_flank Jan-02 12:12:29.676 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > consolidate Jan-02 12:12:29.677 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination Jan-02 12:12:29.678 [main] DEBUG nextflow.Session - Session await Jan-02 12:12:29.875 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:29.879 [Task submitter] INFO nextflow.Session - [67/0493c3] Submitted process > ref_nodes (non_ref_nodes) Jan-02 12:12:29.887 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:29.887 [Task submitter] INFO nextflow.Session - [33/ad3885] Submitted process > make_diamond_db (makedb) Jan-02 12:12:29.892 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:29.893 [Task submitter] INFO nextflow.Session - [ee/889161] Submitted process > get_gaps (get_gaps) Jan-02 12:12:29.908 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:29.908 [Task submitter] INFO nextflow.Session - [4c/200496] Submitted process > non_ref_nodes (non_ref_nodes) Jan-02 12:12:29.950 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3; name: get_gaps (get_gaps); status: COMPLETED; exit: 127; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ee/8891615478d38741b5b5d60707cfcc] Jan-02 12:12:29.969 [Task monitor] INFO nextflow.processor.TaskProcessor - [ee/889161] NOTE: Process get_gaps (get_gaps) terminated with an error exit status (127) -- Execution is retried (1) Jan-02 12:12:29.979 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:29.980 [Task submitter] INFO nextflow.Session - [3c/7ae61a] Re-submitted process > get_gaps (get_gaps) Jan-02 12:12:30.004 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 5; name: get_gaps (get_gaps); status: COMPLETED; exit: 127; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/3c/7ae61a11a0309e8b12155103a06251] Jan-02 12:12:30.006 [Task monitor] INFO nextflow.processor.TaskProcessor - [3c/7ae61a] NOTE: Process get_gaps (get_gaps) terminated with an error exit status (127) -- Execution is retried (2) Jan-02 12:12:30.012 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:30.013 [Task submitter] INFO nextflow.Session - [49/371926] Re-submitted process > get_gaps (get_gaps) Jan-02 12:12:30.025 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/67/0493c3012c3d9db7140126e10845a8] Jan-02 12:12:30.027 [Task monitor] INFO nextflow.processor.TaskProcessor - [67/0493c3] NOTE: Process ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (1) Jan-02 12:12:30.033 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 6; name: get_gaps (get_gaps); status: COMPLETED; exit: 127; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/49/371926995e814c580a2542579f55a6] Jan-02 12:12:30.033 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:30.034 [Task submitter] INFO nextflow.Session - [98/495b0a] Re-submitted process > ref_nodes (non_ref_nodes) Jan-02 12:12:30.035 [Task monitor] INFO nextflow.processor.TaskProcessor - [49/371926] NOTE: Process get_gaps (get_gaps) terminated with an error exit status (127) -- Error is ignored Jan-02 12:12:30.088 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 4; name: non_ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/4c/2004966fa2d9e47295125d7ed44598] Jan-02 12:12:30.090 [Task monitor] INFO nextflow.processor.TaskProcessor - [4c/200496] NOTE: Process non_ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (1) Jan-02 12:12:30.099 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:30.099 [Task submitter] INFO nextflow.Session - [4d/e7e193] Re-submitted process > non_ref_nodes (non_ref_nodes) Jan-02 12:12:30.183 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 7; name: ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/98/495b0a7d7090e5bf03b7f7613fd001] Jan-02 12:12:30.185 [Task monitor] INFO nextflow.processor.TaskProcessor - [98/495b0a] NOTE: Process ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (2) Jan-02 12:12:30.192 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:30.193 [Task submitter] INFO nextflow.Session - [7a/dd0004] Re-submitted process > ref_nodes (non_ref_nodes) Jan-02 12:12:30.276 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 8; name: non_ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/4d/e7e193db2dce336ea3185004417e74] Jan-02 12:12:30.278 [Task monitor] INFO nextflow.processor.TaskProcessor - [4d/e7e193] NOTE: Process non_ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Execution is retried (2) Jan-02 12:12:30.285 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Jan-02 12:12:30.285 [Task submitter] INFO nextflow.Session - [8d/6b2f68] Re-submitted process > non_ref_nodes (non_ref_nodes) Jan-02 12:12:30.345 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: make_diamond_db (makedb); status: COMPLETED; exit: 0; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/33/ad3885d42846a999d57b1894bf08c4] Jan-02 12:12:30.358 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 9; name: ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7a/dd000427cd6a6b220976c877c8e362] Jan-02 12:12:30.359 [Task monitor] INFO nextflow.processor.TaskProcessor - [7a/dd0004] NOTE: Process ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Error is ignored Jan-02 12:12:30.430 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 10; name: non_ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/8d/6b2f6876f776f59f5e126f779e82ca] Jan-02 12:12:30.431 [Task monitor] INFO nextflow.processor.TaskProcessor - [8d/6b2f68] NOTE: Process non_ref_nodes (non_ref_nodes) terminated with an error exit status (1) -- Error is ignored Jan-02 12:12:30.433 [main] DEBUG nextflow.Session - Session await > all processes finished Jan-02 12:12:30.530 [main] DEBUG nextflow.Session - Session await > all barriers passed Jan-02 12:12:30.540 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=9; ignoredCount=3; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=6; abortedCount=0; succeedDuration=850ms; failedDuration=1.1s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=4; peakCpus=7; peakMemory=20 GB; ] Jan-02 12:12:30.745 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done Jan-02 12:12:30.756 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false) Jan-02 12:12:30.756 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/01A-NonRefNodes", line 56 if tot % 100000 == 0: print("Processed {} nodes {}\r".format(tot, " " * 50), end = '') ^ SyntaxError: invalid syntax

File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/01A-NonRefNodes", line 56 if tot % 100000 == 0: print("Processed {} nodes {}\r".format(tot, " " * 50), end = '') ^ SyntaxError: invalid syntax

Johnsonzcode commented 1 year ago

Try to run it with nextflow run nf-GraphSeq/main.nf<http://main.nf/> -profile conda and see if it works. ____ From: johnsonz @.> Sent: Monday, January 2, 2023 12:41:25 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1) There is conf: params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' }params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' } And the command line: nextflow run nf-GraphSeq/main.nf — Reply to this email directly, view it on GitHub<#1 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKFS3QJQVGGHQ4HS6U3WQLEHLANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

~/yin/software/cmake-3.16.2-Linux-x86_64/bin/cmake
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ which gcc
/usr/bin/gcc
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ which g++
/usr/bin/g++
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$  export CMAKE_C_COMPILER=/usr/bin/gcc
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$  export CMAKE_CXX_COMPILER=/usr/bin/g++
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W  ~  version 22.10.1
Launching `nf-GraphSeq/main.nf` [pedantic_knuth] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (4)
[29/c55648] process > make_diamond_db (makedb)      [  0%] 0 of 1
[27/2b2f58] process > non_ref_nodes (non_ref_nodes) [  0%] 0 of 1
[5d/af1766] process > ref_nodes (non_ref_nodes)     [  0%] 0 of 1
[-        ] process > add_support_vector            -
[3c/0fc77f] process > get_gaps (get_gaps)           [  0%] 0 of 1
[-        ] process > add_gap_info                  -
[-        ] process > combine_regions               -
[-        ] process > label_regions                 -
executor >  local (4)
[-        ] process > make_diamond_db (makedb)      -
[-        ] process > non_ref_nodes (non_ref_nodes) -
[-        ] process > ref_nodes (non_ref_nodes)     -
[-        ] process > add_support_vector            -
[3c/0fc77f] process > get_gaps (get_gaps)           [100%] 1 of 1, failed: 1 ✘
[-        ] process > add_gap_info                  -
[-        ] process > combine_regions               -
[-        ] process > label_regions                 -
[-        ] process > get_repetitiveness            -
[-        ] process > cleanup                       -
[-        ] process > bedToFasta                    -
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'get_gaps (get_gaps)'

Caused by:
  Process `get_gaps (get_gaps)` terminated with an error exit status (127)

Command executed:

  faToTwoBit genome_pooled.fa genome_pooled.2bit
  twoBitInfo -nBed genome_pooled.2bit stdout |         awk -v var=1000 'BEGIN{OFS="     "}; $2-var < 0{print $1,"0",$3+var}; $2-var >= 0{print $1,$2-var,$3+var}' |         bedtools sort -i - > gaps.bed

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: faToTwoBit: command not found

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/3c/0fc77fbed01a86c92cec8d7a2e941d

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Johnsonzcode commented 1 year ago

Error has been fixed by installing Dependencies. Sorry about that. And the command line is

nextflow run nf-GraphSeq/main.nf -profile conda
Johnsonzcode commented 1 year ago

But there is another error

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W  ~  version 22.10.4
Launching `nf-GraphSeq/main.nf` [desperate_jones] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (8)
[42/72c12e] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[e9/757e11] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[83/88ea62] process > label_regions (label_reg)     [  0%] 0 of 1
[-        ] process > get_repetitiveness            -
[-        ] process > cleanup                       -
[-        ] process > bedToFasta                    -
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
executor >  local (8)
[42/72c12e] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[e9/757e11] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[83/88ea62] process > label_regions (label_reg)     [100%] 1 of 1, failed: 1 ✘
[-        ] process > get_repetitiveness            -
[-        ] process > cleanup                       -
[-        ] process > bedToFasta                    -
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'label_regions (label_reg)'

Caused by:
  Process `label_regions (label_reg)` terminated with an error exit status (1)

Command executed:

  bname=`basename -s '.bed' non_ref_nodes.labeled.lengths.merged.bed`
  06B-ClassifyRegions -i non_ref_nodes.labeled.lengths.merged.bed -o ${bname} -c contigs.txt -s scaffolds.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/06B-ClassifyRegions", line 68, in <module>
      main()
    File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/06B-ClassifyRegions", line 38, in main
      if os.path.exists(args.scaffolds): scaffolds = { i.split()[0]:int(i.strip().split()[1]) for i in open(args.scaffolds) }
    File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/06B-ClassifyRegions", line 38, in <dictcomp>
      if os.path.exists(args.scaffolds): scaffolds = { i.split()[0]:int(i.strip().split()[1]) for i in open(args.scaffolds) }
  IndexError: list index out of range

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/83/88ea62e80b359d829a3e610928d2d1

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

It seems about contigs.

RenzoTale88 commented 1 year ago

@Johnsonzcode I wasn't specific enough on this, I think the scaffold list needs the name and the size of the sequence.

Johnsonzcode commented 1 year ago

@Johnsonzcode I wasn't specific enough on this, I think the scaffold list needs the name and the size of the sequence.

OK. Thank you. I will try with the size.

Johnsonzcode commented 1 year ago
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda -resume
N E X T F L O W  ~  version 22.10.4
Launching `nf-GraphSeq/main.nf` [kickass_euler] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (4)
[42/72c12e] process > make_diamond_db (makedb)      [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps)           [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec)       [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
executor >  local (4)
[42/72c12e] process > make_diamond_db (makedb)      [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps)           [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec)       [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
[b1/1b9810] process > label_regions (label_reg)     [100%] 1 of 1 ✔
[7d/64b4b2] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[08/1881e4] process > cleanup (cleanup)             [100%] 1 of 1 ✔
[ea/141dbd] process > bedToFasta (bed2fa)           [100%] 1 of 1, failed: 1 ✘
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'bedToFasta (bed2fa)'

Caused by:
  Process `bedToFasta (bed2fa)` terminated with an error exit status (1)

Command executed:

  python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
  samtools faidx candidate.fa

Command exit status:
  1

Command output:
  (empty)

Command error:
  Could not build fai index candidate.fa.fai

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Something about candidate.fa. Maybe it can't be generated.

RenzoTale88 commented 1 year ago

@Johnsonzcode could you please share the content of .command.err and .command.out in work/ea/141dbdd5580b7fa7c820e1e69817b9?

Johnsonzcode commented 1 year ago
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/.command.err
Could not build fai index candidate.fa.fai
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/.command.out
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/.command.log
Could not build fai index candidate.fa.fai
RenzoTale88 commented 1 year ago

Thanks. Can you have a look at the content of non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed in the same folder? If that is empty, you can try have a look at the .command.err/.command.out in work/08/1881e4*/?

Johnsonzcode commented 1 year ago
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
SEQID   BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    ZSCORE  PVAL    SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/
cat: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/: Is a directory
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/.command.out
Average autosomal repetitiveness:  NA
St.Dev. autosomal repetitiveness:  NA

Initial regions (#):  0
Initial regions (bp):  0
Saved regions (#):  0
Saved regions (bp):  0

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/.command.err
Warning message:
In mean.default(repval[, 4]) :
  argument is not numeric or logical: returning NA
Warning message:
In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
  NAs introduced by coercion
RenzoTale88 commented 1 year ago

I think there are two separate issues. One easy to fix might be with the repetitiveness.txt file. Does it have an header? If it does, remove it. The second might require a bit more of digging. Could you look into the different bed files in work/08/1881e4e464402873945f8f01a66ad8? At some point one of them should be empty, which is the failing stage.

Johnsonzcode commented 1 year ago
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/
.command.begin
.command.err
.command.log
.command.out
.command.run
.command.sh
.exitcode
non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
repetitiveness.txt
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s
eqtype.masked.bed
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s
eqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
SEQID   BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    ZSCORE  PVAL    S
EQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/*.bed
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
SEQID   BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    ZSCORE  PVAL    S
EQUENCE

All bed file is empty in this folder.

RenzoTale88 commented 1 year ago

Then we need to go back even more. Have a look at the content of work/7d/64b4b2*/, to check if the bed are empty and if there is an error in the logs there. If so, you can go backwards to the site of the issue. You can find the working folder of each stage before the process name while nextflow is running. By instance [08/1881e4] process > cleanup (cleanup) [100%] 1 of 1 ✔ The folder will be work/08/1881e4*/


From: johnsonz @.> Sent: Tuesday, January 3, 2023 7:26:15 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

(graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/ .command.begin .command.err .command.log .command.out .command.run .command.sh .exitcode non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed repetitiveness.txt (graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s eqtype.masked.bed

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

(graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s eqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL S EQUENCE (graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/*.bed

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL S EQUENCE

All bed file is empty in this folder.

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369459787, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKECRD2SFVGJ7ZOQ5ELWQPIBPANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda -resume
N E X T F L O W  ~  version 22.10.4
Launching `nf-GraphSeq/main.nf` [ridiculous_lamarr] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (1)
[42/72c12e] process > make_diamond_db (makedb)      [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps)           [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec)       [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
executor >  local (1)
[42/72c12e] process > make_diamond_db (makedb)      [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps)           [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec)       [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
[b1/1b9810] process > label_regions (label_reg)     [100%] 1 of 1, cached: 1 ✔
[7d/64b4b2] process > get_repetitiveness (add_rept) [100%] 1 of 1, cached: 1 ✔
[08/1881e4] process > cleanup (cleanup)             [100%] 1 of 1, cached: 1 ✔
[c4/c8e165] process > bedToFasta (bed2fa)           [100%] 1 of 1, failed: 1 ✘
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'bedToFasta (bed2fa)'

Caused by:
  Process `bedToFasta (bed2fa)` terminated with an error exit status (1)

Command executed:

  python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
  samtools faidx candidate.fa

Command exit status:
  1

Command output:
  (empty)

Command error:
  Could not build fai index candidate.fa.fai

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/c4/c8e165a2a05ebd6cec3bedd355b523

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed <==
SEQID   BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    ZSCORE  PVAL    SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2
head: cannot open '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2' for reading: No such file or directory
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS SEQS    N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS NODE_SEQUENCE   N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION  N_MASKED        N_NT    RATIO_MASKED    SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.bed <==
SEQID   BPI     BPE     NODES   N_NODES STRANDS SEQS    N_CLOSE_TO_GAPS NODES_LENGTH

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==
#SEQID  BPI     BPE     NODES   N_NODES STRANDS SEQS    N_CLOSE_TO_GAPS NODES_LENGTH    REGION_SIZE     CLASSIFICATION
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.merged.bed <==
SEQID   BPI     BPE     NODES   N_NODES STRANDS SEQS    N_CLOSE_TO_GAPS NODES_LENGTH
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/gaps.bed <==
CAU_Wild.chr1   64186   66286
CAU_Wild.chr1   118671  120678
CAU_Wild.chr1   216126  218220
CAU_Wild.chr1   307007  309107
CAU_Wild.chr1   342517  344564
CAU_Wild.chr1   463082  465168
CAU_Wild.chr1   595503  597603
CAU_Wild.chr1   12966826        12968885
CAU_Wild.chr1   12976926        12979026
CAU_Wild.chr1   12999216        13001301

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.labeled.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.noNmers.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.supvec.bed <==
Johnsonzcode commented 1 year ago

Maybe remove the header of repetitiveness.txt is a better choice.

RenzoTale88 commented 1 year ago

The error seems to have occurred at the beginning of the workflow. Could you please share the logs in work/fd/9a4a4a*/ ?


From: johnsonz @.> Sent: Tuesday, January 3, 2023 7:52:46 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

(graphseq) @.*** get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda -resume

N E X T F L O W ~ version 22.10.4

Launching nf-GraphSeq/main.nf [ridiculous_lamarr] DSL2 - revision: 77e3a1fa1e

Non-ref sequence v 0.5a

================================

PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg

Reference genome : CAU_Wild

Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa

Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt

Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt

Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt

Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz

Flanking regions : 1000

Gaps flanking regions : 1000

Novelty cutoff (ratio) : 0.95

executor > local (1)

[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔

[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔

[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔

[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔

[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔

[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔

[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔

executor > local (1)

[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔

[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔

[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔

[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔

[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔

[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔

[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔

[b1/1b9810] process > label_regions (label_reg) [100%] 1 of 1, cached: 1 ✔

[7d/64b4b2] process > get_repetitiveness (add_rept) [100%] 1 of 1, cached: 1 ✔

[08/1881e4] process > cleanup (cleanup) [100%] 1 of 1, cached: 1 ✔

[c4/c8e165] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘

[- ] process > selfalign -

[- ] process > simplify -

[- ] process > getfasta -

[- ] process > getfasta_flanked -

[- ] process > blastx -

[- ] process > abinitio -

[- ] process > filter_abinitio -

[- ] process > abinitio_flank -

[- ] process > filter_abinitio_flank -

[- ] process > consolidate -

Error executing process > 'bedToFasta (bed2fa)'

Caused by:

Process bedToFasta (bed2fa) terminated with an error exit status (1)

Command executed:

python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa

samtools faidx candidate.fa

Command exit status:

1

Command output:

(empty)

Command error:

Could not build fai index candidate.fa.fai

Work dir:

/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/c4/c8e165a2a05ebd6cec3bedd355b523

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/.bed

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL SEQUENCE

(graphseq) @.*** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2

head: cannot open '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2' for reading: No such file or directory

(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/.bed

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==

SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==

SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE

(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/.bed

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.bed <==

SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==

SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION

(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/.bed

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.merged.bed <==

SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH

(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/.bed

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/gaps.bed <==

CAU_Wild.chr1 64186 66286

CAU_Wild.chr1 118671 120678

CAU_Wild.chr1 216126 218220

CAU_Wild.chr1 307007 309107

CAU_Wild.chr1 342517 344564

CAU_Wild.chr1 463082 465168

CAU_Wild.chr1 595503 597603

CAU_Wild.chr1 12966826 12968885

CAU_Wild.chr1 12976926 12979026

CAU_Wild.chr1 12999216 13001301

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.labeled.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.noNmers.bed <==

==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.supvec.bed <==

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369473496, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKA6V2NLDETJAKC2JNDWQPLE5ANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

I removed the header and rerun the same error.

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W  ~  version 22.10.4
Launching `nf-GraphSeq/main.nf` [sleepy_feynman] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (11)
[a6/233f28] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[82/18a7f3] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[c6/138589] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[b1/40f191] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
executor >  local (11)
[a6/233f28] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[82/18a7f3] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[c6/138589] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[b1/40f191] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[cc/04bf6b] process > label_regions (label_reg)     [100%] 1 of 1 ✔
[ed/e72c53] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[09/427483] process > cleanup (cleanup)             [100%] 1 of 1 ✔
[d0/9427c8] process > bedToFasta (bed2fa)           [100%] 1 of 1, failed: 1 ✘
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'bedToFasta (bed2fa)'

Caused by:
  Process `bedToFasta (bed2fa)` terminated with an error exit status (1)

Command executed:

  python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
  samtools faidx candidate.fa

Command exit status:
  1

Command output:
  (empty)

Command error:
  Could not build fai index candidate.fa.fai

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/9427c85b509d166b380685e09cd553

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log
Read input PG
Found:
 - 41 nodes
 - 0 edges
 - 41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log
Read input PG
Found:
 - 41 nodes
 - 0 edges
 - 41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
RenzoTale88 commented 1 year ago

I'm afraid something seems to be wrong with the graph. It appears it's got only 41 nodes, and no edges (connections between nodes). You probably need to regenerate it and try again.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 7:59:54 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

I removed the header and rerun the same error.

(graphseq) @.*** get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda

N E X T F L O W ~ version 22.10.4

Launching nf-GraphSeq/main.nf [sleepy_feynman] DSL2 - revision: 77e3a1fa1e

Non-ref sequence v 0.5a

================================

PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg

Reference genome : CAU_Wild

Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa

Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt

Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt

Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt

Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz

Flanking regions : 1000

Gaps flanking regions : 1000

Novelty cutoff (ratio) : 0.95

executor > local (11)

[a6/233f28] process > make_diamond_db (makedb) [100%] 1 of 1 ✔

[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[82/18a7f3] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔

[c6/138589] process > get_gaps (get_gaps) [100%] 1 of 1 ✔

[b1/40f191] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔

[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔

executor > local (11)

[a6/233f28] process > make_diamond_db (makedb) [100%] 1 of 1 ✔

[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[82/18a7f3] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔

[c6/138589] process > get_gaps (get_gaps) [100%] 1 of 1 ✔

[b1/40f191] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔

[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔

[cc/04bf6b] process > label_regions (label_reg) [100%] 1 of 1 ✔

[ed/e72c53] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔

[09/427483] process > cleanup (cleanup) [100%] 1 of 1 ✔

[d0/9427c8] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘

[- ] process > selfalign -

[- ] process > simplify -

[- ] process > getfasta -

[- ] process > getfasta_flanked -

[- ] process > blastx -

[- ] process > abinitio -

[- ] process > filter_abinitio -

[- ] process > abinitio_flank -

[- ] process > filter_abinitio_flank -

[- ] process > consolidate -

Error executing process > 'bedToFasta (bed2fa)'

Caused by:

Process bedToFasta (bed2fa) terminated with an error exit status (1)

Command executed:

python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa

samtools faidx candidate.fa

Command exit status:

1

Command output:

(empty)

Command error:

Could not build fai index candidate.fa.fai

Work dir:

/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/9427c85b509d166b380685e09cd553

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

(graphseq) @.*** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log

Read input PG

Found:

Getting reference paths

Getting reference node ids

Getting query paths

Getting query-specific nodes

Save nodes and their positions in the different genomes

(graphseq) @.*** get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log

Read input PG

Found:

Getting reference paths

Getting reference node ids

Getting query paths

Getting query-specific nodes

Save nodes and their positions in the different genomes

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369477332, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKHNE767IYMR565QEFDWQPL7VANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

I generated the graph with the scripts here.

Johnsonzcode commented 1 year ago

The phylogenetic tree is calculated by mashtree.

Johnsonzcode commented 1 year ago

The configure file looks like

((liancheng:0.00321,guangxi:0.00270):0.00012,pekin:0.00374,((tufted:0.01022,CAU_Wild:0.00349):0.00037,laying:0.00293):0.00011);
pekin   ../genome_chr/pekin_CHR_named.fa
tufted  ../genome_chr/tufted_duck_CHR_named.fa
laying  ../genome_chr/laying_CHR_named.fa
liancheng       ../genome_chr/liancheng_CHR_named.fa
guangxi ../genome_chr/guangxi_CHR_named.fa
CAU_Wild        ../genome_chr/CAU_Wild_CHR_named.fa
RenzoTale88 commented 1 year ago

You can check whether the HAL alignments are fine. If so, you can check the conversion to Pg using hal2vg. I suspect that is where the process failed. You can try re-converting it and check it is valid before proceeding with the analyses.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 8:08:12 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

The configure file looks like

((liancheng:0.00321,guangxi:0.00270):0.00012,pekin:0.00374,((tufted:0.01022,CAU_Wild:0.00349):0.00037,laying:0.00293):0.00011); pekin ../genome_chr/pekin_CHR_named.fa tufted ../genome_chr/tufted_duck_CHR_named.fa laying ../genome_chr/laying_CHR_named.fa liancheng ../genome_chr/liancheng_CHR_named.fa guangxi ../genome_chr/guangxi_CHR_named.fa CAU_Wild ../genome_chr/CAU_Wild_CHR_named.fa

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369483094, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKEXDDC4VPEAVJ22J6TWQPM6ZANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

The log file from cactus

[2022-12-30T19:59:26+0800] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/61b138b1cd1a5a4284708cac2d27dec5/2ffc/worker_log.txt
[2022-12-30T19:59:28+0800] [MainThread] [I] [toil.leader] Finished toil run successfully.
[2022-12-30T19:59:28+0800] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server.
[2022-12-30T19:59:29+0800] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread.
[2022-12-30T20:00:14+0800] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/storage-02/zhaoqiangsen/pan_genome/mwgs/jobStore)
[2022-12-30T20:00:14+0800] [MainThread] [I] [toil.statsAndLogging] Cactus has finished after 39190.21106318687 seconds

It looks fine. The hal2vg step has no error infomation. And the file size about six duck genome (~1Gb every one duck genome.)

507M Dec 31 09:27 five_duck_align.vg
   0 Dec 31 09:24 hal2vg.sh.log
 221 Dec 31 09:24 hal2vg.sh
 19M Dec 31 09:24 nohup.out
3.7G Dec 30 19:59 five_duck_align.hal
1.3K Dec 30 09:06 cactus.sh
 389 Dec 29 23:01 duck_pangenome.txt
RenzoTale88 commented 1 year ago

When I say validate i mean with the appropriate tool (halValidate or VG). Nevertheless, is the input in Packed graph (PG) format? VG format is not working with the script.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 8:15:37 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

The log file from cactus

[2022-12-30T19:59:26+0800] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/61b138b1cd1a5a4284708cac2d27dec5/2ffc/worker_log.txt [2022-12-30T19:59:28+0800] [MainThread] [I] [toil.leader] Finished toil run successfully. [2022-12-30T19:59:28+0800] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server. [2022-12-30T19:59:29+0800] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread. [2022-12-30T20:00:14+0800] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/storage-02/zhaoqiangsen/pan_genome/mwgs/jobStore) [2022-12-30T20:00:14+0800] [MainThread] [I] [toil.statsAndLogging] Cactus has finished after 39190.21106318687 seconds

It looks fine. The hal2vg step has no error infomation. About file size

507M Dec 31 09:27 five_duck_align.vg 0 Dec 31 09:24 hal2vg.sh.log 221 Dec 31 09:24 hal2vg.sh 19M Dec 31 09:24 nohup.out 3.7G Dec 30 19:59 five_duck_align.hal 1.3K Dec 30 09:06 cactus.sh 389 Dec 29 23:01 duck_pangenome.txt

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369488406, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKED32KM3EIREYEJH7LWQPN2TANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

OK thank yuo so much. I am checking the hal file with halValidate. And I use five_duck_align.vg for input. How to check if it is Packed graph (PG) format?

RenzoTale88 commented 1 year ago

You can see the guidelines on the VG wiki. You can also convert with the vg view command and the appropriate input/output options. You can also specify the output when converting with hal2vg.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 8:25:50 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

OK thank yuo so much. I am check the hal file with halValidate. And I use five_duck_align.vg for input. How to check if it is Packed graph (PG) format?

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369495484, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKCGH5FL3ZSOBEX656TWQPPA5ANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

First I check hal file five_duck_align.hal

File valid

Second I check vg file five_duck_align.vg generated by hal2vg

(graphseq) [poultrylab1@pbsnode01 mwgs]$ vg validate five_duck_align.vg
graph: valid

Third I check five_duck_align.vg.packed.graph

~/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/vg convert -p  five_duck_align.vg> five_duck_align.vg.packed.graph
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$  vg validate five_duck_align.vg.packed.graph
graph: valid

And five_duck_align.vg.packed.graph is used for nextflow input.

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W  ~  version 22.10.4
Launching `nf-GraphSeq/main.nf` [naughty_brenner] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg.packed.graph
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (11)
[25/b6a400] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[d1/af6ef8] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[66/2e6866] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[d8/9af0bb] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
executor >  local (11)
[25/b6a400] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[d1/af6ef8] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[66/2e6866] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[d8/9af0bb] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[90/2e8953] process > label_regions (label_reg)     [100%] 1 of 1 ✔
[4f/07299d] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[a9/196c5c] process > cleanup (cleanup)             [100%] 1 of 1 ✔
[43/a5c98e] process > bedToFasta (bed2fa)           [100%] 1 of 1, failed: 1 ✘
[-        ] process > selfalign                     -
[-        ] process > simplify                      -
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'bedToFasta (bed2fa)'

Caused by:
  Process `bedToFasta (bed2fa)` terminated with an error exit status (1)

Command executed:

  python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
  samtools faidx candidate.fa

Command exit status:
  1

Command output:
  (empty)

Command error:
  Could not build fai index candidate.fa.fai

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/43/a5c98ea8c53d2fb63c8095efdf4665

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ bash README
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/
cat: work/b1/d4c7f55bace42950640621585dfbda/: Is a directory
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ l work/b1/d4c7f55bace42950640621585dfbda/
total 0
-rw-rw-r-- 1 poultrylab1 poultrylab1  0 Jan  3 16:50 non_ref_nodes.bed
lrwxrwxrwx 1 poultrylab1 poultrylab1 72 Jan  3 16:50 five_duck_align.vg.packed.graph -> /storage-02/zhaoqiangsen/pan_genome/mwgs/five_duck_align.vg.packed.graph
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ ls work/b1/d4c7f55bace42950640621585dfbda/
five_duck_align.vg.packed.graph  non_ref_nodes.bed
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/
.command.begin                   .command.log                     .command.run                     .exitcode                        non_ref_nodes.bed
.command.err                     .command.out                     .command.sh                      five_duck_align.vg.packed.graph
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/.command.log
Read input PG
Found:
 - 41 nodes
 - 0 edges
 - 41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes

There is still 0 edges. How could I find the reason?

RenzoTale88 commented 1 year ago

I'm afraid it is quite difficult without having access to the data. My only guess is that it is running out of memory, though it's puzzling that is not crashing. Are you running it with enough memory (>128G)? Do you have a way of sharing the graph, so that I can test what is going wrong?


From: johnsonz @.> Sent: Tuesday, January 3, 2023 9:31:38 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

First I check hal file five_duck_align.hal

File valid

Second I check vg file five_duck_align.vg generated by hal2vg

(graphseq) @.*** mwgs]$ vg validate five_duck_align.vg

graph: valid

Third I check five_duck_align.vg.packed.graph

~/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/vg convert -p five_duck_align.vg> five_duck_align.vg.packed.graph

(graphseq) @.*** get_non_ref_seq]$ vg validate five_duck_align.vg.packed.graph

graph: valid

And five_duck_align.vg.packed.graph is used for nextflow input.

(graphseq) @.*** get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda

N E X T F L O W ~ version 22.10.4

Launching nf-GraphSeq/main.nf [naughty_brenner] DSL2 - revision: 77e3a1fa1e

Non-ref sequence v 0.5a

================================

PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg.packed.graph

Reference genome : CAU_Wild

Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa

Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt

Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt

Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt

Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz

Flanking regions : 1000

Gaps flanking regions : 1000

Novelty cutoff (ratio) : 0.95

executor > local (11)

[25/b6a400] process > make_diamond_db (makedb) [100%] 1 of 1 ✔

[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[d1/af6ef8] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔

[66/2e6866] process > get_gaps (get_gaps) [100%] 1 of 1 ✔

[d8/9af0bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔

[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔

executor > local (11)

[25/b6a400] process > make_diamond_db (makedb) [100%] 1 of 1 ✔

[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[d1/af6ef8] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔

[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔

[66/2e6866] process > get_gaps (get_gaps) [100%] 1 of 1 ✔

[d8/9af0bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔

[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔

[90/2e8953] process > label_regions (label_reg) [100%] 1 of 1 ✔

[4f/07299d] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔

[a9/196c5c] process > cleanup (cleanup) [100%] 1 of 1 ✔

[43/a5c98e] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘

[- ] process > selfalign -

[- ] process > simplify -

[- ] process > getfasta -

[- ] process > getfasta_flanked -

[- ] process > blastx -

[- ] process > abinitio -

[- ] process > filter_abinitio -

[- ] process > abinitio_flank -

[- ] process > filter_abinitio_flank -

[- ] process > consolidate -

Error executing process > 'bedToFasta (bed2fa)'

Caused by:

Process bedToFasta (bed2fa) terminated with an error exit status (1)

Command executed:

python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa

samtools faidx candidate.fa

Command exit status:

1

Command output:

(empty)

Command error:

Could not build fai index candidate.fa.fai

Work dir:

/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/43/a5c98ea8c53d2fb63c8095efdf4665

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

(graphseq) @.*** get_non_ref_seq]$ bash README

(graphseq) @.*** get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/

cat: work/b1/d4c7f55bace42950640621585dfbda/: Is a directory

(graphseq) @.*** get_non_ref_seq]$ l work/b1/d4c7f55bace42950640621585dfbda/

total 0

-rw-rw-r-- 1 poultrylab1 poultrylab1 0 Jan 3 16:50 non_ref_nodes.bed

lrwxrwxrwx 1 poultrylab1 poultrylab1 72 Jan 3 16:50 five_duck_align.vg.packed.graph -> /storage-02/zhaoqiangsen/pan_genome/mwgs/five_duck_align.vg.packed.graph

(graphseq) @.*** get_non_ref_seq]$ ls work/b1/d4c7f55bace42950640621585dfbda/

five_duck_align.vg.packed.graph non_ref_nodes.bed

(graphseq) @.*** get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/

.command.begin .command.log .command.run .exitcode non_ref_nodes.bed

.command.err .command.out .command.sh five_duck_align.vg.packed.graph

(graphseq) @.*** get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/.command.log

Read input PG

Found:

Getting reference paths

Getting reference node ids

Getting query paths

Getting query-specific nodes

Save nodes and their positions in the different genomes

There is still 0 edges. How could I find the reason?

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369552737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKC5TVEKTP3MGCORXSDWQPWXVANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

Yes, I have enough memory and there is no crashing. Which graph file you want to test?

Johnsonzcode commented 1 year ago

Maybe that is the reason why : I used the command line following to convert.

hal2vg --noAncestors --hdf5InMemory --rootGenome CAU_Wild five_duck_align.hal > five_duck_align.vg

Actually I don't konw the noAncestors and rootGenome. Just use the scripts from here

RenzoTale88 commented 1 year ago

Hi, I would try without --rootGenome and see if it works. If you can share, please provide the HAL and VG files. Share them with @.*** and I'll have a look over the next few days.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 2:40:08 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

Maybe that is the reason why : I used the command line following to convert.

hal2vg --noAncestors --hdf5InMemory --rootGenome CAU_Wild five_duck_align.hal > five_duck_align.vg

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369782466, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKAQKDOOD6IPQAU54I3WQQT3RANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

Your email address has been hidden. I can share but the hal file is huge(3GB). And I will try to rerun without rootGenome.

RenzoTale88 commented 1 year ago

The problem is indeed --rootGenome. That option means that you use only genomes below that. You have to specify --refGenomes, followed by your reference.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 3:08:38 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

Your email address has been hidden. I can share but the hal file is huge(3GB). And I will try to rerun without rootGenome.

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369809662, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKHH7T5HOPVOAFAC673WQQXGNANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

For new version hal2vg, there is no option refGenomes. So I use the wrong option in that time. And I am trying to fix. All options:

USAGE:
/storage-01/poultrylab1/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/hal2vg [Options] <halFile>

ARGUMENTS:
halFile:   input hal file

OPTIONS:
--cacheBytes <value>:          obsolete name for --hdf5CacheBytes [default =
                               15728640]
--cacheMDC <value>:            obsolete name for --hdf5CacheMDC  [default = 113]
--cacheRDC <value>:            obsolete name for --hdf5CacheRDC [default = 599999]
--cacheW0 <value>:             obsolete name for --hdf5CacheW0 [default = 0.75]
--chop <value>:                chop up nodes in output graph so they are not longer
                               than given length [default = 0]
--format <value>:              choose the back-end storage format. [default = hdf5]
--hdf5CacheBytes <value>:      maximum size in bytes of regular hdf5 cache [default =
                                15728640]
--hdf5CacheMDC <value>:        number of metadata slots in hdf5 cache [default = 113]
--hdf5CacheRDC <value>:        number of regular slots in hdf5 cache.  should be a
                               prime number ~= 10 * DefaultCacheRDCBytes / chunk
                               [default = 599999]
--hdf5CacheW0 <value>:         w0 parameter for hdf5 cache [default = 0.75]
--hdf5InMemory:                load all data in memory (and disable hdf5 cache)
                               [default = 0]
--help:                        display this help page [default = 0]
--ignoreGenomes <value>:       comma-separated (no spaces) list of genomes to ignore
                               [default = ""]
--inMemory:                    obsolete name for --hdf5InMemory [default = 0]
--noAncestors:                 don't write ancestral paths, nor sequence exclusive to
                                ancestral genomes [default = 0]
--onlySequenceNames:           use only sequence names for output names.  By default,
                                the UCSC convention of Genome.Sequence is used
                               [default = 0]
--outputFormat <value>:        output graph format in {pg, hg, odgi} [default=pg]
                               [default = pg]
--progress:                    show progress [default = 0]
--rootGenome <value>:          process only genomes in clade with specified root (HAL
                                root if empty) [default = ""]
--targetGenomes <value>:       comma-separated (no spaces) list of target genomes
                               (others are excluded) (all leaves if empty) [default =
                                ""]
RenzoTale88 commented 1 year ago

I think the version on the GitHub repository has it (see hal2vg.cpp https://github.com/ComparativeGenomicsToolkit/hal2vg/blob/master/hal2vg.cpp) If using that version doesn't work either, I might need more time to figure out what changed in the software and edit the workflow.


From: johnsonz @.> Sent: Tuesday, January 3, 2023 3:17:09 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)

For new version hal2vg, there is no option refGenomes. So I use the wrong option in that time. And I am trying to fix. All options:

USAGE: /storage-01/poultrylab1/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/hal2vg [Options]

ARGUMENTS: halFile: input hal file

OPTIONS: --cacheBytes : obsolete name for --hdf5CacheBytes [default = 15728640] --cacheMDC : obsolete name for --hdf5CacheMDC [default = 113] --cacheRDC : obsolete name for --hdf5CacheRDC [default = 599999] --cacheW0 : obsolete name for --hdf5CacheW0 [default = 0.75] --chop : chop up nodes in output graph so they are not longer than given length [default = 0] --format : choose the back-end storage format. [default = hdf5] --hdf5CacheBytes : maximum size in bytes of regular hdf5 cache [default = 15728640] --hdf5CacheMDC : number of metadata slots in hdf5 cache [default = 113] --hdf5CacheRDC : number of regular slots in hdf5 cache. should be a prime number ~= 10 * DefaultCacheRDCBytes / chunk [default = 599999] --hdf5CacheW0 : w0 parameter for hdf5 cache [default = 0.75] --hdf5InMemory: load all data in memory (and disable hdf5 cache) [default = 0] --help: display this help page [default = 0] --ignoreGenomes : comma-separated (no spaces) list of genomes to ignore [default = ""] --inMemory: obsolete name for --hdf5InMemory [default = 0] --noAncestors: don't write ancestral paths, nor sequence exclusive to ancestral genomes [default = 0] --onlySequenceNames: use only sequence names for output names. By default, the UCSC convention of Genome.Sequence is used [default = 0] --outputFormat : output graph format in {pg, hg, odgi} [default=pg] [default = pg] --progress: show progress [default = 0] --rootGenome : process only genomes in clade with specified root (HAL root if empty) [default = ""] --targetGenomes : comma-separated (no spaces) list of target genomes (others are excluded) (all leaves if empty) [default = ""]

— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369819316, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKCMUEXCO2ZQV457SDTWQQYGLANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>

Johnsonzcode commented 1 year ago

OK, thank you. I will try with this version.

Johnsonzcode commented 1 year ago

After using previous hal2vg, another error comes

(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W  ~  version 22.10.4
Launching `nf-GraphSeq/main.nf` [trusting_hugle] DSL2 - revision: 77e3a1fa1e
Non-ref sequence   v 0.5a
================================
PG                         : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg1
Reference genome           : CAU_Wild
Sequence pool              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs                : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs              : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness  : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta             : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions           : 1000
Gaps flanking regions      : 1000
Novelty cutoff (ratio)     : 0.95

executor >  local (13)
[fd/88c4aa] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[60/0be13a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[1c/9c58fb] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[24/e3fe87] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[10/77e52c] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[56/b42bc1] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[cc/27d451] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[ed/da7aeb] process > label_regions (label_reg)     [100%] 1 of 1 ✔
[9e/4dcc36] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[c5/157bb6] process > cleanup (cleanup)             [100%] 1 of 1 ✔
[cb/dee46e] process > bedToFasta (bed2fa)           [100%] 1 of 1 ✔
[ee/7a2598] process > selfalign (selfalign)         [100%] 1 of 1 ✔
[60/2c4f95] process > simplify (simplify)           [  0%] 0 of 1
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'simplify (simplify)'

Caused by:
  Process `simplify (simplify)` terminated with an error exit status (1)

Command executed:

  09C-DetectDuplicateContigs alignments.blasttab candidate.fa.fai candidate.clump.txt
  09D-faiToBed candidate.clump.txt > candidate.clump.bed

Command exit status:
  1

Command output:
  (empty)

Command error:
  ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
  ✔ ggplot2 3.3.3     ✔ purrr   0.3.4
  ✔ tibble  3.1.2     ✔ dplyr   1.0.6
  ✔ tidyr   1.1.3     ✔ stringr 1.4.0
  ✔ readr   1.4.0     ✔ forcats 0.5.1
  ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
  ✖ dplyr::filter() masks stats::filter()
executor >  local (13)
[fd/88c4aa] process > make_diamond_db (makedb)      [100%] 1 of 1 ✔
[60/0be13a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[1c/9c58fb] process > ref_nodes (non_ref_nodes)     [100%] 1 of 1 ✔
[24/e3fe87] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[10/77e52c] process > get_gaps (get_gaps)           [100%] 1 of 1 ✔
[56/b42bc1] process > add_gap_info (supp_vec)       [100%] 1 of 1 ✔
[cc/27d451] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[ed/da7aeb] process > label_regions (label_reg)     [100%] 1 of 1 ✔
[9e/4dcc36] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[c5/157bb6] process > cleanup (cleanup)             [100%] 1 of 1 ✔
[cb/dee46e] process > bedToFasta (bed2fa)           [100%] 1 of 1 ✔
[ee/7a2598] process > selfalign (selfalign)         [100%] 1 of 1 ✔
[60/2c4f95] process > simplify (simplify)           [100%] 1 of 1, failed: 1 ✘
[-        ] process > getfasta                      -
[-        ] process > getfasta_flanked              -
[-        ] process > blastx                        -
[-        ] process > abinitio                      -
[-        ] process > filter_abinitio               -
[-        ] process > abinitio_flank                -
[-        ] process > filter_abinitio_flank         -
[-        ] process > consolidate                   -
Error executing process > 'simplify (simplify)'

Caused by:
  Process `simplify (simplify)` terminated with an error exit status (1)

Command executed:

  09C-DetectDuplicateContigs alignments.blasttab candidate.fa.fai candidate.clump.txt
  09D-faiToBed candidate.clump.txt > candidate.clump.bed

Command exit status:
  1

Command output:
  (empty)

Command error:
  ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
  ✔ ggplot2 3.3.3     ✔ purrr   0.3.4
  ✔ tibble  3.1.2     ✔ dplyr   1.0.6
  ✔ tidyr   1.1.3     ✔ stringr 1.4.0
  ✔ readr   1.4.0     ✔ forcats 0.5.1
  ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
  ✖ dplyr::filter() masks stats::filter()
  ✖ dplyr::lag()    masks stats::lag()

  Attaching package: ‘reshape2’

  The following object is masked from ‘package:tidyr’:

      smiths

  Error: arrange() failed at implicit mutate() step.
  * Problem with `mutate()` column `..1`.
  ℹ `..1 = V2`.
  ✖ object 'V2' not found
  Backtrace:
       █
    1. ├─contigs %>% arrange(desc(V2))
    2. ├─dplyr::arrange(., desc(V2))
    3. ├─dplyr:::arrange.data.frame(., desc(V2))
    4. │ └─dplyr:::arrange_rows(.data, dots)
    5. │   ├─base::withCallingHandlers(...)
    6. │   ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
    7. │   └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
    8. │     ├─dplyr::mutate(.data, !!!dots, .keep = "none")
    9. │     └─dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
   10. │       └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
   11. │         ├─base::withCallingHandlers(...)
   12. │         └─mask$eval_all_mutate(quo)
   13. ├─base::.handleSimpleError(...)
   14. │ └─dplyr:::h(simpleError(msg, call))
   15. │   └─rlang::abort(...)
   16. │     └─rlang:::signal_abort(cnd)
   17. │       └─base::signalCondition(cnd)
   18. └─(function (cnd) ...
  Execution halted

Work dir:
  /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
[poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489/.command.err
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3     ✔ purrr   0.3.4
✔ tibble  3.1.2     ✔ dplyr   1.0.6
✔ tidyr   1.1.3     ✔ stringr 1.4.0
✔ readr   1.4.0     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘reshape2’

The following object is masked from ‘package:tidyr’:

    smiths

Error: arrange() failed at implicit mutate() step.
* Problem with `mutate()` column `..1`.
ℹ `..1 = V2`.
✖ object 'V2' not found
Backtrace:
     █
  1. ├─contigs %>% arrange(desc(V2))
  2. ├─dplyr::arrange(., desc(V2))
  3. ├─dplyr:::arrange.data.frame(., desc(V2))
  4. │ └─dplyr:::arrange_rows(.data, dots)
  5. │   ├─base::withCallingHandlers(...)
  6. │   ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
  7. │   └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
  8. │     ├─dplyr::mutate(.data, !!!dots, .keep = "none")
  9. │     └─dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
 10. │       └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
 11. │         ├─base::withCallingHandlers(...)
 12. │         └─mask$eval_all_mutate(quo)
 13. ├─base::.handleSimpleError(...)
 14. │ └─dplyr:::h(simpleError(msg, call))
 15. │   └─rlang::abort(...)
 16. │     └─rlang:::signal_abort(cnd)
 17. │       └─base::signalCondition(cnd)
 18. └─(function (cnd) ...
Execution halted
[poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489/.command.log
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3     ✔ purrr   0.3.4
✔ tibble  3.1.2     ✔ dplyr   1.0.6
✔ tidyr   1.1.3     ✔ stringr 1.4.0
✔ readr   1.4.0     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘reshape2’

The following object is masked from ‘package:tidyr’:

    smiths

Error: arrange() failed at implicit mutate() step.
* Problem with `mutate()` column `..1`.
ℹ `..1 = V2`.
✖ object 'V2' not found
Backtrace:
     █
  1. ├─contigs %>% arrange(desc(V2))
  2. ├─dplyr::arrange(., desc(V2))
  3. ├─dplyr:::arrange.data.frame(., desc(V2))
  4. │ └─dplyr:::arrange_rows(.data, dots)
  5. │   ├─base::withCallingHandlers(...)
  6. │   ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
  7. │   └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
  8. │     ├─dplyr::mutate(.data, !!!dots, .keep = "none")
  9. │     └─dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
 10. │       └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
 11. │         ├─base::withCallingHandlers(...)
 12. │         └─mask$eval_all_mutate(quo)
 13. ├─base::.handleSimpleError(...)
 14. │ └─dplyr:::h(simpleError(msg, call))
 15. │   └─rlang::abort(...)
 16. │     └─rlang:::signal_abort(cnd)
 17. │       └─base::signalCondition(cnd)
 18. └─(function (cnd) ...
Execution halted
[poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489/.command.out

The files alignments.blasttab and candidate.fa.fai are not empty.