Closed Johnsonzcode closed 1 year ago
@Johnsonzcode I've added some more detail to the workflow inputs. Let me know if you have any more questions on this.
Andrea
@RenzoTale88
Apreciate a lot! I need some information indeed.
Should I use the pan genome vg file from this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/cactus
or this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/wgcactus
to detect non-reference sequence?
If I should use the vg file from https://github.com/evotools/CattleGraphGenomePaper/tree/master/cactus
, how do I merge the vg file from different chromosomes?
From paper, the chromosome by chromosome pan genome was build for VG5
with this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/cactus
, and it seems non reference sequence was detected from the pan genome generated from this pipeline https://github.com/evotools/CattleGraphGenomePaper/tree/master/wgcactus
. Is that right ?
The chromosome by chromosome pan genome has been generated. but I use chrN
for all assemlies. Does the building of pan genome must follow the name convention as GENOME.SEQUENCE
? Maybe I should build another one.
Look forward to your reply. Johnson
@Johnsonzcode
@RenzoTale88
After building the VG5, the SVs and small varients should be added into VG5. If use vg
software to do so, the chromosome by chromosome graph should be merged into one vg file. How to do that?
@Johnsonzcode at the time, it was possible simply by concatenating the multiple VG files with cat
(see here) and then run vg ids -j
. Since recent versions I believe the recommended way is to use vg combine
.
@RenzoTale88 Apreciate it so much.
@RenzoTale88
nf-GraphSeq
, if there is no short contigs in assembly, i.e. short contigs were filtered, should I just privide a empty file for --contigs
?nuc
and seqkit fx2tab
but both are inappropriate. Is there any recommendations ?@Johnsonzcode
@RenzoTale88 The pipeline works errors.
N E X T F L O W ~ version 22.10.1
Launching `nf-GraphSeq/main.nf` [peaceful_mayer] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_align.vg
Reference genome : xxx
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (10)
[33/ad3885] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[8d/6b2f68] process > non_ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔
[7a/dd0004] process > ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > add_support_vector -
[49/371926] process > get_gaps (get_gaps) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > add_gap_info -
[- ] process > combine_regions -
[- ] process > label_regions -
[- ] process > get_repetitiveness -
[- ] process > cleanup -
[- ] process > bedToFasta -
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
[ee/889161] NOTE: Process `get_gaps (get_gaps)` terminated with an error exit status (127) -- Execution is retried (1)
[3c/7ae61a] NOTE: Process `get_gaps (get_gaps)` terminated with an error exit status (127) -- Execution is retried (2)
[67/0493c3] NOTE: Process `ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (1)
[49/371926] NOTE: Process `get_gaps (get_gaps)` terminated with an error exit status (127) -- Error is ignored
[4c/200496] NOTE: Process `non_ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (1)
[98/495b0a] NOTE: Process `ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (2)
[4d/e7e193] NOTE: Process `non_ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Execution is retried (2)
[7a/dd0004] NOTE: Process `ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Error is ignored
[8d/6b2f68] NOTE: Process `non_ref_nodes (non_ref_nodes)` terminated with an error exit status (1) -- Error is ignored
How could I debug to find reason? Or Do you have some ideas? Thank you!
Hi could you share the file .nextflow.log
in the working directory, and the .command.err
and .command.log
in work/8d/6b2f68*/ ?
Are you running the workflow in anaconda profile? (-profile conda)
From: johnsonz @.> Sent: Monday, January 2, 2023 4:22:18 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
@RenzoTale88https://github.com/RenzoTale88 The pipeline works errors.
N E X T F L O W ~ version 22.10.1
Launching nf-GraphSeq/main.nf
[peaceful_mayer] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_align.vg
Reference genome : xxx
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (10)
[33/ad3885] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[8d/6b2f68] process > non_ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔
[7a/dd0004] process > ref_nodes (non_ref_nodes) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > add_support_vector -
[49/371926] process > get_gaps (get_gaps) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > add_gap_info -
[- ] process > combine_regions -
[- ] process > label_regions -
[- ] process > get_repetitiveness -
[- ] process > cleanup -
[- ] process > bedToFasta -
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
[ee/889161] NOTE: Process get_gaps (get_gaps)
terminated with an error exit status (127) -- Execution is retried (1)
[3c/7ae61a] NOTE: Process get_gaps (get_gaps)
terminated with an error exit status (127) -- Execution is retried (2)
[67/0493c3] NOTE: Process ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (1)
[49/371926] NOTE: Process get_gaps (get_gaps)
terminated with an error exit status (127) -- Error is ignored
[4c/200496] NOTE: Process non_ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (1)
[98/495b0a] NOTE: Process ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (2)
[4d/e7e193] NOTE: Process non_ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (2)
[7a/dd0004] NOTE: Process ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Error is ignored
[8d/6b2f68] NOTE: Process non_ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Error is ignored
How could I debug to find reason? Or Do you have some ideas? Thank you!
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1368648653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKCJUYPB7CSB2PGIABLWQJJXVANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
There is conf:
params {
pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg'
genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa'
contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt'
scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt'
repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt'
reference = 'CAU_Wild'
proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz'
flanks = 1000
gap_flanks = 1000
novelty_cutoff = 0.95
outfolder = 'outdir'
frc = false
help = false
publish_dir_mode = 'copy'
extra_cluster_options = ''
}params {
pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg'
genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa'
contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt'
scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt'
repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt'
reference = 'CAU_Wild'
proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz'
flanks = 1000
gap_flanks = 1000
novelty_cutoff = 0.95
outfolder = 'outdir'
frc = false
help = false
publish_dir_mode = 'copy'
extra_cluster_options = ''
}
And the command line:
nextflow run nf-GraphSeq/main.nf
Try to run it with
nextflow run nf-GraphSeq/main.nf<http://main.nf/> -profile conda
and see if it works.
From: johnsonz @.> Sent: Monday, January 2, 2023 12:41:25 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
There is conf:
params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' }params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' }
And the command line:
nextflow run nf-GraphSeq/main.nf
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1368914436, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKFS3QJQVGGHQ4HS6U3WQLEHLANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
This is the
.nextflow.log
,.command.err
and.command.log
.Jan-02 12:12:26.741 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-GraphSeq/main.nf Jan-02 12:12:26.855 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 22.10.1 Jan-02 12:12:26.886 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/storage-01/poultrylab1/.nextflow/plugins; core-plugins: nf-amazon@1.11.0,nf-azure@0.14.2,nf-codecommit@0.1.2,nf-console@1.0.4,nf-ga4gh@1.0.4,nf-google@1.4.4,nf-tower@1.5.5,nf-wave@0.5.2 Jan-02 12:12:26.900 [main] INFO org.pf4j.DefaultPluginStatusProvider - Enabled plugins: [] Jan-02 12:12:26.901 [main] INFO org.pf4j.DefaultPluginStatusProvider - Disabled plugins: [] Jan-02 12:12:26.907 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode Jan-02 12:12:26.922 [main] INFO org.pf4j.AbstractPluginManager - No plugins Jan-02 12:12:26.945 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/nextflow.config Jan-02 12:12:26.947 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/nextflow.config Jan-02 12:12:26.970 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard` Jan-02 12:12:27.870 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion Jan-02 12:12:27.889 [main] INFO nextflow.cli.CmdRun - Launching `nf-GraphSeq/main.nf` [peaceful_mayer] DSL2 - revision: 77e3a1fa1e Jan-02 12:12:27.890 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[] Jan-02 12:12:27.890 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[] Jan-02 12:12:27.900 [main] DEBUG nextflow.secret.LocalSecretsProvider - Secrets store: /storage-01/poultrylab1/.nextflow/secrets/store.json Jan-02 12:12:27.904 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@3468ee6e] - activable => nextflow.secret.LocalSecretsProvider@3468ee6e Jan-02 12:12:27.967 [main] DEBUG nextflow.Session - Session UUID: cfa054dd-5654-4180-aec4-e41dd42e351b Jan-02 12:12:27.968 [main] DEBUG nextflow.Session - Run name: peaceful_mayer Jan-02 12:12:27.968 [main] DEBUG nextflow.Session - Executor pool size: 208 Jan-02 12:12:27.979 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=624; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false Jan-02 12:12:28.006 [main] DEBUG nextflow.cli.CmdRun - Version: 22.10.1 build 5828 Created: 27-10-2022 16:58 UTC (28-10-2022 00:58 CDT) System: Linux 3.10.0-1127.19.1.el7.x86_64 Runtime: Groovy 3.0.13 on OpenJDK 64-Bit Server VM 11.0.13+7-b1751.21 Encoding: UTF-8 (UTF-8) Process: 298667@pbsnode01 [202.112.170.234] CPUs: 208 - Mem: 1007.1 GB (439.6 GB) - Swap: 64 GB (62 GB) Jan-02 12:12:28.028 [main] DEBUG nextflow.Session - Work-dir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work [xfs] Jan-02 12:12:28.048 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[] Jan-02 12:12:28.060 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory Jan-02 12:12:28.089 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory Jan-02 12:12:28.101 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 209; maxThreads: 1000 Jan-02 12:12:28.234 [main] DEBUG nextflow.Session - Session start Jan-02 12:12:28.526 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution Jan-02 12:12:28.547 [main] INFO nextflow.Nextflow - Non-ref sequence v 0.5a ================================ PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg Reference genome : CAU_Wild Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz Flanking regions : 1000 Gaps flanking regions : 1000 Novelty cutoff (ratio) : 0.95
Jan-02 12:12:29.269 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name make_diamond_db
Jan-02 12:12:29.290 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.290 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.299 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Jan-02 12:12:29.308 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=208; memory=1007.1 GB; capacity=208; pollInterval=100ms; dumpInterval=5m
Jan-02 12:12:29.473 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name non_ref_nodes
Jan-02 12:12:29.475 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.475 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.485 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name ref_nodes
Jan-02 12:12:29.486 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.486 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.493 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name add_support_vector
Jan-02 12:12:29.495 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.495 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.502 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name get_gaps
Jan-02 12:12:29.503 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.503 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.510 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name add_gap_info
Jan-02 12:12:29.511 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.511 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.518 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name combine_regions
Jan-02 12:12:29.519 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.519 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.524 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small
matches labels small
for process with name label_regions
Jan-02 12:12:29.525 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.526 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.562 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name get_repetitiveness
Jan-02 12:12:29.563 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.563 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.568 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name cleanup
Jan-02 12:12:29.569 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.569 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.582 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small
matches labels small
for process with name bedToFasta
Jan-02 12:12:29.583 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.583 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.594 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name selfalign
Jan-02 12:12:29.594 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.594 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.598 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:medium
matches labels medium
for process with name simplify
Jan-02 12:12:29.599 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.599 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.603 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small
matches labels small
for process with name getfasta
Jan-02 12:12:29.604 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.604 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.608 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:small
matches labels small
for process with name getfasta_flanked
Jan-02 12:12:29.609 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.609 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.616 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name blastx
Jan-02 12:12:29.617 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.617 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.624 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name abinitio
Jan-02 12:12:29.625 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.625 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.629 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name filter_abinitio
Jan-02 12:12:29.629 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.630 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.633 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name abinitio_flank
Jan-02 12:12:29.634 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.634 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.638 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name filter_abinitio_flank
Jan-02 12:12:29.639 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.639 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.643 [main] DEBUG nextflow.script.ProcessConfig - Config settings withLabel:large
matches labels large
for process with name consolidate
Jan-02 12:12:29.643 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jan-02 12:12:29.643 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jan-02 12:12:29.646 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: abinitio_flank, filter_abinitio, getfasta, consolidate, add_support_vector, combine_regions, frc_filter, get_gaps, add_gap_info, ref_nodes, non_ref_nodes, get_repetitiveness, getfasta_flanked, label_regions, filter_abinitio_flank, selfalign, cleanup, make_diamond_db, abinitio, bedToFasta, blastx, simplify
Jan-02 12:12:29.646 [main] DEBUG nextflow.Session - Igniting dataflow network (21)
Jan-02 12:12:29.647 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > make_diamond_db
Jan-02 12:12:29.648 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > non_ref_nodes
Jan-02 12:12:29.649 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > ref_nodes
Jan-02 12:12:29.649 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > add_support_vector
Jan-02 12:12:29.651 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > get_gaps
Jan-02 12:12:29.655 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > add_gap_info
Jan-02 12:12:29.656 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > combine_regions
Jan-02 12:12:29.659 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > label_regions
Jan-02 12:12:29.663 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > get_repetitiveness
Jan-02 12:12:29.665 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > cleanup
Jan-02 12:12:29.665 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > bedToFasta
Jan-02 12:12:29.666 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > selfalign
Jan-02 12:12:29.667 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > simplify
Jan-02 12:12:29.667 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > getfasta
Jan-02 12:12:29.669 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > getfasta_flanked
Jan-02 12:12:29.671 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > blastx
Jan-02 12:12:29.671 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > abinitio
Jan-02 12:12:29.672 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > filter_abinitio
Jan-02 12:12:29.674 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > abinitio_flank
Jan-02 12:12:29.674 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > filter_abinitio_flank
Jan-02 12:12:29.676 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > consolidate
Jan-02 12:12:29.677 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
Jan-02 12:12:29.678 [main] DEBUG nextflow.Session - Session await
Jan-02 12:12:29.875 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:29.879 [Task submitter] INFO nextflow.Session - [67/0493c3] Submitted process > ref_nodes (non_ref_nodes)
Jan-02 12:12:29.887 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:29.887 [Task submitter] INFO nextflow.Session - [33/ad3885] Submitted process > make_diamond_db (makedb)
Jan-02 12:12:29.892 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:29.893 [Task submitter] INFO nextflow.Session - [ee/889161] Submitted process > get_gaps (get_gaps)
Jan-02 12:12:29.908 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:29.908 [Task submitter] INFO nextflow.Session - [4c/200496] Submitted process > non_ref_nodes (non_ref_nodes)
Jan-02 12:12:29.950 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3; name: get_gaps (get_gaps); status: COMPLETED; exit: 127; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ee/8891615478d38741b5b5d60707cfcc]
Jan-02 12:12:29.969 [Task monitor] INFO nextflow.processor.TaskProcessor - [ee/889161] NOTE: Process get_gaps (get_gaps)
terminated with an error exit status (127) -- Execution is retried (1)
Jan-02 12:12:29.979 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:29.980 [Task submitter] INFO nextflow.Session - [3c/7ae61a] Re-submitted process > get_gaps (get_gaps)
Jan-02 12:12:30.004 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 5; name: get_gaps (get_gaps); status: COMPLETED; exit: 127; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/3c/7ae61a11a0309e8b12155103a06251]
Jan-02 12:12:30.006 [Task monitor] INFO nextflow.processor.TaskProcessor - [3c/7ae61a] NOTE: Process get_gaps (get_gaps)
terminated with an error exit status (127) -- Execution is retried (2)
Jan-02 12:12:30.012 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:30.013 [Task submitter] INFO nextflow.Session - [49/371926] Re-submitted process > get_gaps (get_gaps)
Jan-02 12:12:30.025 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/67/0493c3012c3d9db7140126e10845a8]
Jan-02 12:12:30.027 [Task monitor] INFO nextflow.processor.TaskProcessor - [67/0493c3] NOTE: Process ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (1)
Jan-02 12:12:30.033 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 6; name: get_gaps (get_gaps); status: COMPLETED; exit: 127; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/49/371926995e814c580a2542579f55a6]
Jan-02 12:12:30.033 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:30.034 [Task submitter] INFO nextflow.Session - [98/495b0a] Re-submitted process > ref_nodes (non_ref_nodes)
Jan-02 12:12:30.035 [Task monitor] INFO nextflow.processor.TaskProcessor - [49/371926] NOTE: Process get_gaps (get_gaps)
terminated with an error exit status (127) -- Error is ignored
Jan-02 12:12:30.088 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 4; name: non_ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/4c/2004966fa2d9e47295125d7ed44598]
Jan-02 12:12:30.090 [Task monitor] INFO nextflow.processor.TaskProcessor - [4c/200496] NOTE: Process non_ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (1)
Jan-02 12:12:30.099 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:30.099 [Task submitter] INFO nextflow.Session - [4d/e7e193] Re-submitted process > non_ref_nodes (non_ref_nodes)
Jan-02 12:12:30.183 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 7; name: ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/98/495b0a7d7090e5bf03b7f7613fd001]
Jan-02 12:12:30.185 [Task monitor] INFO nextflow.processor.TaskProcessor - [98/495b0a] NOTE: Process ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (2)
Jan-02 12:12:30.192 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:30.193 [Task submitter] INFO nextflow.Session - [7a/dd0004] Re-submitted process > ref_nodes (non_ref_nodes)
Jan-02 12:12:30.276 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 8; name: non_ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/4d/e7e193db2dce336ea3185004417e74]
Jan-02 12:12:30.278 [Task monitor] INFO nextflow.processor.TaskProcessor - [4d/e7e193] NOTE: Process non_ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Execution is retried (2)
Jan-02 12:12:30.285 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-02 12:12:30.285 [Task submitter] INFO nextflow.Session - [8d/6b2f68] Re-submitted process > non_ref_nodes (non_ref_nodes)
Jan-02 12:12:30.345 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: make_diamond_db (makedb); status: COMPLETED; exit: 0; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/33/ad3885d42846a999d57b1894bf08c4]
Jan-02 12:12:30.358 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 9; name: ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7a/dd000427cd6a6b220976c877c8e362]
Jan-02 12:12:30.359 [Task monitor] INFO nextflow.processor.TaskProcessor - [7a/dd0004] NOTE: Process ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Error is ignored
Jan-02 12:12:30.430 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 10; name: non_ref_nodes (non_ref_nodes); status: COMPLETED; exit: 1; error: -; workDir: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/8d/6b2f6876f776f59f5e126f779e82ca]
Jan-02 12:12:30.431 [Task monitor] INFO nextflow.processor.TaskProcessor - [8d/6b2f68] NOTE: Process non_ref_nodes (non_ref_nodes)
terminated with an error exit status (1) -- Error is ignored
Jan-02 12:12:30.433 [main] DEBUG nextflow.Session - Session await > all processes finished
Jan-02 12:12:30.530 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jan-02 12:12:30.540 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=9; ignoredCount=3; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=6; abortedCount=0; succeedDuration=850ms; failedDuration=1.1s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=4; peakCpus=7; peakMemory=20 GB; ]
Jan-02 12:12:30.745 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jan-02 12:12:30.756 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Jan-02 12:12:30.756 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/01A-NonRefNodes", line 56 if tot % 100000 == 0: print("Processed {} nodes {}\r".format(tot, " " * 50), end = '') ^ SyntaxError: invalid syntax
File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/01A-NonRefNodes", line 56 if tot % 100000 == 0: print("Processed {} nodes {}\r".format(tot, " " * 50), end = '') ^ SyntaxError: invalid syntax
Try to run it with
nextflow run nf-GraphSeq/main.nf<http://main.nf/> -profile conda
and see if it works. … ____ From: johnsonz @.> Sent: Monday, January 2, 2023 12:41:25 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1) There is conf: params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' }params { pg = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg' genome_pool = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa' contigs = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt' scaffolds = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt' repetitiveness = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt' reference = 'CAU_Wild' proteins = '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz' flanks = 1000 gap_flanks = 1000 novelty_cutoff = 0.95 outfolder = 'outdir' frc = false help = false publish_dir_mode = 'copy' extra_cluster_options = '' } And the command line: nextflow run nf-GraphSeq/main.nf — Reply to this email directly, view it on GitHub<#1 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKFS3QJQVGGHQ4HS6U3WQLEHLANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
~/yin/software/cmake-3.16.2-Linux-x86_64/bin/cmake
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ which gcc
/usr/bin/gcc
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ which g++
/usr/bin/g++
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ export CMAKE_C_COMPILER=/usr/bin/gcc
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ export CMAKE_CXX_COMPILER=/usr/bin/g++
(nextflow) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.1
Launching `nf-GraphSeq/main.nf` [pedantic_knuth] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (4)
[29/c55648] process > make_diamond_db (makedb) [ 0%] 0 of 1
[27/2b2f58] process > non_ref_nodes (non_ref_nodes) [ 0%] 0 of 1
[5d/af1766] process > ref_nodes (non_ref_nodes) [ 0%] 0 of 1
[- ] process > add_support_vector -
[3c/0fc77f] process > get_gaps (get_gaps) [ 0%] 0 of 1
[- ] process > add_gap_info -
[- ] process > combine_regions -
[- ] process > label_regions -
executor > local (4)
[- ] process > make_diamond_db (makedb) -
[- ] process > non_ref_nodes (non_ref_nodes) -
[- ] process > ref_nodes (non_ref_nodes) -
[- ] process > add_support_vector -
[3c/0fc77f] process > get_gaps (get_gaps) [100%] 1 of 1, failed: 1 ✘
[- ] process > add_gap_info -
[- ] process > combine_regions -
[- ] process > label_regions -
[- ] process > get_repetitiveness -
[- ] process > cleanup -
[- ] process > bedToFasta -
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'get_gaps (get_gaps)'
Caused by:
Process `get_gaps (get_gaps)` terminated with an error exit status (127)
Command executed:
faToTwoBit genome_pooled.fa genome_pooled.2bit
twoBitInfo -nBed genome_pooled.2bit stdout | awk -v var=1000 'BEGIN{OFS=" "}; $2-var < 0{print $1,"0",$3+var}; $2-var >= 0{print $1,$2-var,$3+var}' | bedtools sort -i - > gaps.bed
Command exit status:
127
Command output:
(empty)
Command error:
.command.sh: line 2: faToTwoBit: command not found
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/3c/0fc77fbed01a86c92cec8d7a2e941d
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Error has been fixed by installing Dependencies
. Sorry about that.
And the command line is
nextflow run nf-GraphSeq/main.nf -profile conda
But there is another error
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.4
Launching `nf-GraphSeq/main.nf` [desperate_jones] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (8)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[83/88ea62] process > label_regions (label_reg) [ 0%] 0 of 1
[- ] process > get_repetitiveness -
[- ] process > cleanup -
[- ] process > bedToFasta -
[- ] process > selfalign -
[- ] process > simplify -
executor > local (8)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[83/88ea62] process > label_regions (label_reg) [100%] 1 of 1, failed: 1 ✘
[- ] process > get_repetitiveness -
[- ] process > cleanup -
[- ] process > bedToFasta -
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'label_regions (label_reg)'
Caused by:
Process `label_regions (label_reg)` terminated with an error exit status (1)
Command executed:
bname=`basename -s '.bed' non_ref_nodes.labeled.lengths.merged.bed`
06B-ClassifyRegions -i non_ref_nodes.labeled.lengths.merged.bed -o ${bname} -c contigs.txt -s scaffolds.txt
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/06B-ClassifyRegions", line 68, in <module>
main()
File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/06B-ClassifyRegions", line 38, in main
if os.path.exists(args.scaffolds): scaffolds = { i.split()[0]:int(i.strip().split()[1]) for i in open(args.scaffolds) }
File "/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/nf-GraphSeq/bin/06B-ClassifyRegions", line 38, in <dictcomp>
if os.path.exists(args.scaffolds): scaffolds = { i.split()[0]:int(i.strip().split()[1]) for i in open(args.scaffolds) }
IndexError: list index out of range
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/83/88ea62e80b359d829a3e610928d2d1
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
It seems about contigs.
@Johnsonzcode I wasn't specific enough on this, I think the scaffold list needs the name and the size of the sequence.
@Johnsonzcode I wasn't specific enough on this, I think the scaffold list needs the name and the size of the sequence.
OK. Thank you. I will try with the size.
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda -resume
N E X T F L O W ~ version 22.10.4
Launching `nf-GraphSeq/main.nf` [kickass_euler] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (4)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
executor > local (4)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
[b1/1b9810] process > label_regions (label_reg) [100%] 1 of 1 ✔
[7d/64b4b2] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[08/1881e4] process > cleanup (cleanup) [100%] 1 of 1 ✔
[ea/141dbd] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process `bedToFasta (bed2fa)` terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Something about candidate.fa. Maybe it can't be generated.
@Johnsonzcode could you please share the content of .command.err
and .command.out
in work/ea/141dbdd5580b7fa7c820e1e69817b9
?
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/.command.err
Could not build fai index candidate.fa.fai
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/.command.out
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/.command.log
Could not build fai index candidate.fa.fai
Thanks. Can you have a look at the content of non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
in the same folder? If that is empty, you can try have a look at the .command.err
/.command.out
in work/08/1881e4*/
?
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ea/141dbdd5580b7fa7c820e1e69817b9/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/
cat: /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/: Is a directory
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/.command.out
Average autosomal repetitiveness: NA
St.Dev. autosomal repetitiveness: NA
Initial regions (#): 0
Initial regions (bp): 0
Saved regions (#): 0
Saved regions (bp): 0
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/.command.err
Warning message:
In mean.default(repval[, 4]) :
argument is not numeric or logical: returning NA
Warning message:
In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
NAs introduced by coercion
I think there are two separate issues.
One easy to fix might be with the repetitiveness.txt
file. Does it have an header? If it does, remove it.
The second might require a bit more of digging. Could you look into the different bed files in work/08/1881e4e464402873945f8f01a66ad8
? At some point one of them should be empty, which is the failing stage.
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/
.command.begin
.command.err
.command.log
.command.out
.command.run
.command.sh
.exitcode
non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed
non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
repetitiveness.txt
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s
eqtype.masked.bed
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s
eqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed
SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL S
EQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/*.bed
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL S
EQUENCE
All bed file is empty in this folder.
Then we need to go back even more. Have a look at the content of work/7d/64b4b2*/
, to check if the bed are empty and if there is an error in the logs there. If so, you can go backwards to the site of the issue. You can find the working folder of each stage before the process name while nextflow is running. By instance
[08/1881e4] process > cleanup (cleanup) [100%] 1 of 1 ✔
The folder will be work/08/1881e4*/
From: johnsonz @.> Sent: Tuesday, January 3, 2023 7:26:15 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
(graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/ .command.begin .command.err .command.log .command.out .command.run .command.sh .exitcode non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed repetitiveness.txt (graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s eqtype.masked.bed
(graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.s eqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL S EQUENCE (graphseq) @. get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/*.bed
SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL S EQUENCE
All bed file is empty in this folder.
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369459787, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKECRD2SFVGJ7ZOQ5ELWQPIBPANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda -resume
N E X T F L O W ~ version 22.10.4
Launching `nf-GraphSeq/main.nf` [ridiculous_lamarr] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (1)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
executor > local (1)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
[b1/1b9810] process > label_regions (label_reg) [100%] 1 of 1, cached: 1 ✔
[7d/64b4b2] process > get_repetitiveness (add_rept) [100%] 1 of 1, cached: 1 ✔
[08/1881e4] process > cleanup (cleanup) [100%] 1 of 1, cached: 1 ✔
[c4/c8e165] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process `bedToFasta (bed2fa)` terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/c4/c8e165a2a05ebd6cec3bedd355b523
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed <==
SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2
head: cannot open '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2' for reading: No such file or directory
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED SEQUENCE
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.bed <==
SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==
#SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.merged.bed <==
SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/*.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/gaps.bed <==
CAU_Wild.chr1 64186 66286
CAU_Wild.chr1 118671 120678
CAU_Wild.chr1 216126 218220
CAU_Wild.chr1 307007 309107
CAU_Wild.chr1 342517 344564
CAU_Wild.chr1 463082 465168
CAU_Wild.chr1 595503 597603
CAU_Wild.chr1 12966826 12968885
CAU_Wild.chr1 12976926 12979026
CAU_Wild.chr1 12999216 13001301
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.labeled.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.noNmers.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.supvec.bed <==
Maybe remove the header of repetitiveness.txt
is a better choice.
The error seems to have occurred at the beginning of the workflow. Could you please share the logs in work/fd/9a4a4a*/ ?
From: johnsonz @.> Sent: Tuesday, January 3, 2023 7:52:46 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
(graphseq) @.*** get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda -resume
N E X T F L O W ~ version 22.10.4
Launching nf-GraphSeq/main.nf
[ridiculous_lamarr] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (1)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
executor > local (1)
[42/72c12e] process > make_diamond_db (makedb) [100%] 1 of 1, cached: 1 ✔
[fd/9a4a4a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[9b/15f2f9] process > ref_nodes (non_ref_nodes) [100%] 1 of 1, cached: 1 ✔
[a3/469cfe] process > add_support_vector (supp_vec) [100%] 1 of 1, cached: 1 ✔
[e9/757e11] process > get_gaps (get_gaps) [100%] 1 of 1, cached: 1 ✔
[ca/4fd2bb] process > add_gap_info (supp_vec) [100%] 1 of 1, cached: 1 ✔
[25/f2652e] process > combine_regions (combine_reg) [100%] 1 of 1, cached: 1 ✔
[b1/1b9810] process > label_regions (label_reg) [100%] 1 of 1, cached: 1 ✔
[7d/64b4b2] process > get_repetitiveness (add_rept) [100%] 1 of 1, cached: 1 ✔
[08/1881e4] process > cleanup (cleanup) [100%] 1 of 1, cached: 1 ✔
[c4/c8e165] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process bedToFasta (bed2fa)
terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/c4/c8e165a2a05ebd6cec3bedd355b523
Tip: when you have fixed the problem you can continue the execution adding the option -resume
to the run command line
(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/08/1881e4e464402873945f8f01a66ad8/non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed <==
SEQID BPI BPE NODES N_NODES STRANDS NODE_SEQUENCE N_CLOSE_TO_GAPS NODES_LENGTH REGION_SIZE CLASSIFICATION N_MASKED N_NT RATIO_MASKED ZSCORE PVAL SEQUENCE
(graphseq) @.*** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2
head: cannot open '/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2' for reading: No such file or directory
(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/7d/64b4b2bc31fa46e57e33fb3d3c4592/non_ref_nodes.labeled.lengths.merged.seqtype.masked.bed <==
(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.bed <==
SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/b1/1b9810662cd70463bcf6239de64450/non_ref_nodes.labeled.lengths.merged.seqtype.bed <==
(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/25/f2652e682b9119b75000e87cd4c38d/non_ref_nodes.labeled.lengths.merged.bed <==
SEQID BPI BPE NODES N_NODES STRANDS SEQS N_CLOSE_TO_GAPS NODES_LENGTH
(graphseq) @.** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/.bed
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/gaps.bed <==
CAU_Wild.chr1 64186 66286
CAU_Wild.chr1 118671 120678
CAU_Wild.chr1 216126 218220
CAU_Wild.chr1 307007 309107
CAU_Wild.chr1 342517 344564
CAU_Wild.chr1 463082 465168
CAU_Wild.chr1 595503 597603
CAU_Wild.chr1 12966826 12968885
CAU_Wild.chr1 12976926 12979026
CAU_Wild.chr1 12999216 13001301
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.labeled.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.noNmers.bed <==
==> /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/ca/4fd2bb1906c75070e0e943587addbf/non_ref_nodes.supvec.bed <==
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369473496, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKA6V2NLDETJAKC2JNDWQPLE5ANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
I removed the header and rerun the same error.
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.4
Launching `nf-GraphSeq/main.nf` [sleepy_feynman] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (11)
[a6/233f28] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[82/18a7f3] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[c6/138589] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[b1/40f191] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
executor > local (11)
[a6/233f28] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[82/18a7f3] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[c6/138589] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[b1/40f191] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[cc/04bf6b] process > label_regions (label_reg) [100%] 1 of 1 ✔
[ed/e72c53] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[09/427483] process > cleanup (cleanup) [100%] 1 of 1 ✔
[d0/9427c8] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process `bedToFasta (bed2fa)` terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/9427c85b509d166b380685e09cd553
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log
Read input PG
Found:
- 41 nodes
- 0 edges
- 41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log
Read input PG
Found:
- 41 nodes
- 0 edges
- 41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
I'm afraid something seems to be wrong with the graph. It appears it's got only 41 nodes, and no edges (connections between nodes). You probably need to regenerate it and try again.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 7:59:54 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
I removed the header and rerun the same error.
(graphseq) @.*** get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.4
Launching nf-GraphSeq/main.nf
[sleepy_feynman] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (11)
[a6/233f28] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[82/18a7f3] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[c6/138589] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[b1/40f191] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
executor > local (11)
[a6/233f28] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[d0/f5d91d] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[82/18a7f3] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[40/5ee068] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[c6/138589] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[b1/40f191] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[60/091eea] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[cc/04bf6b] process > label_regions (label_reg) [100%] 1 of 1 ✔
[ed/e72c53] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[09/427483] process > cleanup (cleanup) [100%] 1 of 1 ✔
[d0/9427c8] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process bedToFasta (bed2fa)
terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/9427c85b509d166b380685e09cd553
Tip: when you have fixed the problem you can continue the execution adding the option -resume
to the run command line
(graphseq) @.*** get_non_ref_seq]$ head /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log
Read input PG
Found:
41 nodes
0 edges
41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
(graphseq) @.*** get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/d0/f5d91d7dd754401d0e25bcf6c72eca/.command.log
Read input PG
Found:
41 nodes
0 edges
41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369477332, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKHNE767IYMR565QEFDWQPL7VANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
I generated the graph with the scripts here.
The phylogenetic tree is calculated by mashtree
.
The configure file looks like
((liancheng:0.00321,guangxi:0.00270):0.00012,pekin:0.00374,((tufted:0.01022,CAU_Wild:0.00349):0.00037,laying:0.00293):0.00011);
pekin ../genome_chr/pekin_CHR_named.fa
tufted ../genome_chr/tufted_duck_CHR_named.fa
laying ../genome_chr/laying_CHR_named.fa
liancheng ../genome_chr/liancheng_CHR_named.fa
guangxi ../genome_chr/guangxi_CHR_named.fa
CAU_Wild ../genome_chr/CAU_Wild_CHR_named.fa
You can check whether the HAL alignments are fine. If so, you can check the conversion to Pg using hal2vg. I suspect that is where the process failed. You can try re-converting it and check it is valid before proceeding with the analyses.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 8:08:12 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
The configure file looks like
((liancheng:0.00321,guangxi:0.00270):0.00012,pekin:0.00374,((tufted:0.01022,CAU_Wild:0.00349):0.00037,laying:0.00293):0.00011); pekin ../genome_chr/pekin_CHR_named.fa tufted ../genome_chr/tufted_duck_CHR_named.fa laying ../genome_chr/laying_CHR_named.fa liancheng ../genome_chr/liancheng_CHR_named.fa guangxi ../genome_chr/guangxi_CHR_named.fa CAU_Wild ../genome_chr/CAU_Wild_CHR_named.fa
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369483094, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKEXDDC4VPEAVJ22J6TWQPM6ZANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
The log file from cactus
[2022-12-30T19:59:26+0800] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/61b138b1cd1a5a4284708cac2d27dec5/2ffc/worker_log.txt
[2022-12-30T19:59:28+0800] [MainThread] [I] [toil.leader] Finished toil run successfully.
[2022-12-30T19:59:28+0800] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server.
[2022-12-30T19:59:29+0800] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread.
[2022-12-30T20:00:14+0800] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/storage-02/zhaoqiangsen/pan_genome/mwgs/jobStore)
[2022-12-30T20:00:14+0800] [MainThread] [I] [toil.statsAndLogging] Cactus has finished after 39190.21106318687 seconds
It looks fine.
The hal2vg
step has no error infomation.
And the file size about six duck genome (~1Gb every one duck genome.)
507M Dec 31 09:27 five_duck_align.vg
0 Dec 31 09:24 hal2vg.sh.log
221 Dec 31 09:24 hal2vg.sh
19M Dec 31 09:24 nohup.out
3.7G Dec 30 19:59 five_duck_align.hal
1.3K Dec 30 09:06 cactus.sh
389 Dec 29 23:01 duck_pangenome.txt
When I say validate i mean with the appropriate tool (halValidate or VG). Nevertheless, is the input in Packed graph (PG) format? VG format is not working with the script.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 8:15:37 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
The log file from cactus
[2022-12-30T19:59:26+0800] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/61b138b1cd1a5a4284708cac2d27dec5/2ffc/worker_log.txt [2022-12-30T19:59:28+0800] [MainThread] [I] [toil.leader] Finished toil run successfully. [2022-12-30T19:59:28+0800] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server. [2022-12-30T19:59:29+0800] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread. [2022-12-30T20:00:14+0800] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/storage-02/zhaoqiangsen/pan_genome/mwgs/jobStore) [2022-12-30T20:00:14+0800] [MainThread] [I] [toil.statsAndLogging] Cactus has finished after 39190.21106318687 seconds
It looks fine. The hal2vg step has no error infomation. About file size
507M Dec 31 09:27 five_duck_align.vg 0 Dec 31 09:24 hal2vg.sh.log 221 Dec 31 09:24 hal2vg.sh 19M Dec 31 09:24 nohup.out 3.7G Dec 30 19:59 five_duck_align.hal 1.3K Dec 30 09:06 cactus.sh 389 Dec 29 23:01 duck_pangenome.txt
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369488406, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKED32KM3EIREYEJH7LWQPN2TANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
OK thank yuo so much. I am checking the hal file with halValidate
. And I use five_duck_align.vg
for input. How to check if it is Packed graph (PG) format?
You can see the guidelines on the VG wiki. You can also convert with the vg view
command and the appropriate input/output options. You can also specify the output when converting with hal2vg
.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 8:25:50 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
OK thank yuo so much. I am check the hal file with halValidate. And I use five_duck_align.vg for input. How to check if it is Packed graph (PG) format?
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369495484, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKCGH5FL3ZSOBEX656TWQPPA5ANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
First I check hal file five_duck_align.hal
File valid
Second I check vg file five_duck_align.vg
generated by hal2vg
(graphseq) [poultrylab1@pbsnode01 mwgs]$ vg validate five_duck_align.vg
graph: valid
Third I check five_duck_align.vg.packed.graph
~/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/vg convert -p five_duck_align.vg> five_duck_align.vg.packed.graph
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ vg validate five_duck_align.vg.packed.graph
graph: valid
And five_duck_align.vg.packed.graph
is used for nextflow input.
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.4
Launching `nf-GraphSeq/main.nf` [naughty_brenner] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg.packed.graph
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (11)
[25/b6a400] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[d1/af6ef8] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[66/2e6866] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[d8/9af0bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
executor > local (11)
[25/b6a400] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[d1/af6ef8] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[66/2e6866] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[d8/9af0bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[90/2e8953] process > label_regions (label_reg) [100%] 1 of 1 ✔
[4f/07299d] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[a9/196c5c] process > cleanup (cleanup) [100%] 1 of 1 ✔
[43/a5c98e] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process `bedToFasta (bed2fa)` terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/43/a5c98ea8c53d2fb63c8095efdf4665
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ bash README
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/
cat: work/b1/d4c7f55bace42950640621585dfbda/: Is a directory
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ l work/b1/d4c7f55bace42950640621585dfbda/
total 0
-rw-rw-r-- 1 poultrylab1 poultrylab1 0 Jan 3 16:50 non_ref_nodes.bed
lrwxrwxrwx 1 poultrylab1 poultrylab1 72 Jan 3 16:50 five_duck_align.vg.packed.graph -> /storage-02/zhaoqiangsen/pan_genome/mwgs/five_duck_align.vg.packed.graph
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ ls work/b1/d4c7f55bace42950640621585dfbda/
five_duck_align.vg.packed.graph non_ref_nodes.bed
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/
.command.begin .command.log .command.run .exitcode non_ref_nodes.bed
.command.err .command.out .command.sh five_duck_align.vg.packed.graph
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/.command.log
Read input PG
Found:
- 41 nodes
- 0 edges
- 41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
There is still 0 edges. How could I find the reason?
I'm afraid it is quite difficult without having access to the data. My only guess is that it is running out of memory, though it's puzzling that is not crashing. Are you running it with enough memory (>128G)? Do you have a way of sharing the graph, so that I can test what is going wrong?
From: johnsonz @.> Sent: Tuesday, January 3, 2023 9:31:38 AM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
First I check hal file five_duck_align.hal
File valid
Second I check vg file five_duck_align.vg generated by hal2vg
(graphseq) @.*** mwgs]$ vg validate five_duck_align.vg
graph: valid
Third I check five_duck_align.vg.packed.graph
~/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/vg convert -p five_duck_align.vg> five_duck_align.vg.packed.graph
(graphseq) @.*** get_non_ref_seq]$ vg validate five_duck_align.vg.packed.graph
graph: valid
And five_duck_align.vg.packed.graph is used for nextflow input.
(graphseq) @.*** get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.4
Launching nf-GraphSeq/main.nf
[naughty_brenner] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg.packed.graph
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (11)
[25/b6a400] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[d1/af6ef8] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[66/2e6866] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[d8/9af0bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
executor > local (11)
[25/b6a400] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[b1/d4c7f5] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[d1/af6ef8] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[16/71bc27] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[66/2e6866] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[d8/9af0bb] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[10/a98c00] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[90/2e8953] process > label_regions (label_reg) [100%] 1 of 1 ✔
[4f/07299d] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[a9/196c5c] process > cleanup (cleanup) [100%] 1 of 1 ✔
[43/a5c98e] process > bedToFasta (bed2fa) [100%] 1 of 1, failed: 1 ✘
[- ] process > selfalign -
[- ] process > simplify -
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'bedToFasta (bed2fa)'
Caused by:
Process bedToFasta (bed2fa)
terminated with an error exit status (1)
Command executed:
python -c 'import sys; [sys.stdout.write( f">{line.strip().split()[0]}_{line.strip().split()[1]}-{line.strip().split()[2]}\n{line.strip().split()[-1]}\n" ) for line in open(sys.argv[1]) if "SEQID" not in line]' non_ref_nodes.labeled.lengths.merged.seqtype.masked.long.novel.noTelomere.noFlankGaps.lowrep.candidate.bed > candidate.fa
samtools faidx candidate.fa
Command exit status:
1
Command output:
(empty)
Command error:
Could not build fai index candidate.fa.fai
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/43/a5c98ea8c53d2fb63c8095efdf4665
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
(graphseq) @.*** get_non_ref_seq]$ bash README
(graphseq) @.*** get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/
cat: work/b1/d4c7f55bace42950640621585dfbda/: Is a directory
(graphseq) @.*** get_non_ref_seq]$ l work/b1/d4c7f55bace42950640621585dfbda/
total 0
-rw-rw-r-- 1 poultrylab1 poultrylab1 0 Jan 3 16:50 non_ref_nodes.bed
lrwxrwxrwx 1 poultrylab1 poultrylab1 72 Jan 3 16:50 five_duck_align.vg.packed.graph -> /storage-02/zhaoqiangsen/pan_genome/mwgs/five_duck_align.vg.packed.graph
(graphseq) @.*** get_non_ref_seq]$ ls work/b1/d4c7f55bace42950640621585dfbda/
five_duck_align.vg.packed.graph non_ref_nodes.bed
(graphseq) @.*** get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/
.command.begin .command.log .command.run .exitcode non_ref_nodes.bed
.command.err .command.out .command.sh five_duck_align.vg.packed.graph
(graphseq) @.*** get_non_ref_seq]$ cat work/b1/d4c7f55bace42950640621585dfbda/.command.log
Read input PG
Found:
41 nodes
0 edges
41 paths
Getting reference paths
Getting reference node ids
Getting query paths
Getting query-specific nodes
Save nodes and their positions in the different genomes
There is still 0 edges. How could I find the reason?
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369552737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKC5TVEKTP3MGCORXSDWQPWXVANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
Yes, I have enough memory and there is no crashing. Which graph file you want to test?
Maybe that is the reason why : I used the command line following to convert.
hal2vg --noAncestors --hdf5InMemory --rootGenome CAU_Wild five_duck_align.hal > five_duck_align.vg
Actually I don't konw the noAncestors
and rootGenome
. Just use the scripts from here
Hi, I would try without --rootGenome and see if it works. If you can share, please provide the HAL and VG files. Share them with @.*** and I'll have a look over the next few days.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 2:40:08 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
Maybe that is the reason why : I used the command line following to convert.
hal2vg --noAncestors --hdf5InMemory --rootGenome CAU_Wild five_duck_align.hal > five_duck_align.vg
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369782466, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKAQKDOOD6IPQAU54I3WQQT3RANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
Your email address has been hidden. I can share but the hal file is huge(3GB). And I will try to rerun without rootGenome.
The problem is indeed --rootGenome. That option means that you use only genomes below that. You have to specify --refGenomes, followed by your reference.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 3:08:38 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
Your email address has been hidden. I can share but the hal file is huge(3GB). And I will try to rerun without rootGenome.
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369809662, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKHH7T5HOPVOAFAC673WQQXGNANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
For new version hal2vg, there is no option refGenomes
. So I use the wrong option in that time. And I am trying to fix.
All options:
USAGE:
/storage-01/poultrylab1/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/hal2vg [Options] <halFile>
ARGUMENTS:
halFile: input hal file
OPTIONS:
--cacheBytes <value>: obsolete name for --hdf5CacheBytes [default =
15728640]
--cacheMDC <value>: obsolete name for --hdf5CacheMDC [default = 113]
--cacheRDC <value>: obsolete name for --hdf5CacheRDC [default = 599999]
--cacheW0 <value>: obsolete name for --hdf5CacheW0 [default = 0.75]
--chop <value>: chop up nodes in output graph so they are not longer
than given length [default = 0]
--format <value>: choose the back-end storage format. [default = hdf5]
--hdf5CacheBytes <value>: maximum size in bytes of regular hdf5 cache [default =
15728640]
--hdf5CacheMDC <value>: number of metadata slots in hdf5 cache [default = 113]
--hdf5CacheRDC <value>: number of regular slots in hdf5 cache. should be a
prime number ~= 10 * DefaultCacheRDCBytes / chunk
[default = 599999]
--hdf5CacheW0 <value>: w0 parameter for hdf5 cache [default = 0.75]
--hdf5InMemory: load all data in memory (and disable hdf5 cache)
[default = 0]
--help: display this help page [default = 0]
--ignoreGenomes <value>: comma-separated (no spaces) list of genomes to ignore
[default = ""]
--inMemory: obsolete name for --hdf5InMemory [default = 0]
--noAncestors: don't write ancestral paths, nor sequence exclusive to
ancestral genomes [default = 0]
--onlySequenceNames: use only sequence names for output names. By default,
the UCSC convention of Genome.Sequence is used
[default = 0]
--outputFormat <value>: output graph format in {pg, hg, odgi} [default=pg]
[default = pg]
--progress: show progress [default = 0]
--rootGenome <value>: process only genomes in clade with specified root (HAL
root if empty) [default = ""]
--targetGenomes <value>: comma-separated (no spaces) list of target genomes
(others are excluded) (all leaves if empty) [default =
""]
I think the version on the GitHub repository has it (see hal2vg.cpp https://github.com/ComparativeGenomicsToolkit/hal2vg/blob/master/hal2vg.cpp) If using that version doesn't work either, I might need more time to figure out what changed in the software and edit the workflow.
From: johnsonz @.> Sent: Tuesday, January 3, 2023 3:17:09 PM To: evotools/CattleGraphGenomePaper @.> Cc: RenzoTale88 @.>; Mention @.> Subject: Re: [evotools/CattleGraphGenomePaper] Help about non-reference sequence detection (Issue #1)
For new version hal2vg, there is no option refGenomes. So I use the wrong option in that time. And I am trying to fix. All options:
USAGE:
/storage-01/poultrylab1/zhaoqiangsen/software/cactus-bin-v2.2.4/bin/hal2vg [Options]
ARGUMENTS: halFile: input hal file
OPTIONS:
--cacheBytes
— Reply to this email directly, view it on GitHubhttps://github.com/evotools/CattleGraphGenomePaper/issues/1#issuecomment-1369819316, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKCMUEXCO2ZQV457SDTWQQYGLANCNFSM6AAAAAATHRV4AE. You are receiving this because you were mentioned.Message ID: @.***>
OK, thank you. I will try with this version.
After using previous hal2vg, another error comes
(graphseq) [poultrylab1@pbsnode01 get_non_ref_seq]$ nextflow run nf-GraphSeq/main.nf -profile conda
N E X T F L O W ~ version 22.10.4
Launching `nf-GraphSeq/main.nf` [trusting_hugle] DSL2 - revision: 77e3a1fa1e
Non-ref sequence v 0.5a
================================
PG : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/five_duck_align.vg1
Reference genome : CAU_Wild
Sequence pool : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/genome_pooled.fa
Contigs IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/contigs.txt
Scaffolds IDs : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/scaffolds.txt
Autosomes' repetitiveness : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/repetitiveness.txt
Proteins fasta : /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/Anas_platyrhynchos.ASM874695v1.pep.all.fa.gz
Flanking regions : 1000
Gaps flanking regions : 1000
Novelty cutoff (ratio) : 0.95
executor > local (13)
[fd/88c4aa] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[60/0be13a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[1c/9c58fb] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[24/e3fe87] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[10/77e52c] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[56/b42bc1] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[cc/27d451] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[ed/da7aeb] process > label_regions (label_reg) [100%] 1 of 1 ✔
[9e/4dcc36] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[c5/157bb6] process > cleanup (cleanup) [100%] 1 of 1 ✔
[cb/dee46e] process > bedToFasta (bed2fa) [100%] 1 of 1 ✔
[ee/7a2598] process > selfalign (selfalign) [100%] 1 of 1 ✔
[60/2c4f95] process > simplify (simplify) [ 0%] 0 of 1
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'simplify (simplify)'
Caused by:
Process `simplify (simplify)` terminated with an error exit status (1)
Command executed:
09C-DetectDuplicateContigs alignments.blasttab candidate.fa.fai candidate.clump.txt
09D-faiToBed candidate.clump.txt > candidate.clump.bed
Command exit status:
1
Command output:
(empty)
Command error:
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3 ✔ purrr 0.3.4
✔ tibble 3.1.2 ✔ dplyr 1.0.6
✔ tidyr 1.1.3 ✔ stringr 1.4.0
✔ readr 1.4.0 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
executor > local (13)
[fd/88c4aa] process > make_diamond_db (makedb) [100%] 1 of 1 ✔
[60/0be13a] process > non_ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[1c/9c58fb] process > ref_nodes (non_ref_nodes) [100%] 1 of 1 ✔
[24/e3fe87] process > add_support_vector (supp_vec) [100%] 1 of 1 ✔
[10/77e52c] process > get_gaps (get_gaps) [100%] 1 of 1 ✔
[56/b42bc1] process > add_gap_info (supp_vec) [100%] 1 of 1 ✔
[cc/27d451] process > combine_regions (combine_reg) [100%] 1 of 1 ✔
[ed/da7aeb] process > label_regions (label_reg) [100%] 1 of 1 ✔
[9e/4dcc36] process > get_repetitiveness (add_rept) [100%] 1 of 1 ✔
[c5/157bb6] process > cleanup (cleanup) [100%] 1 of 1 ✔
[cb/dee46e] process > bedToFasta (bed2fa) [100%] 1 of 1 ✔
[ee/7a2598] process > selfalign (selfalign) [100%] 1 of 1 ✔
[60/2c4f95] process > simplify (simplify) [100%] 1 of 1, failed: 1 ✘
[- ] process > getfasta -
[- ] process > getfasta_flanked -
[- ] process > blastx -
[- ] process > abinitio -
[- ] process > filter_abinitio -
[- ] process > abinitio_flank -
[- ] process > filter_abinitio_flank -
[- ] process > consolidate -
Error executing process > 'simplify (simplify)'
Caused by:
Process `simplify (simplify)` terminated with an error exit status (1)
Command executed:
09C-DetectDuplicateContigs alignments.blasttab candidate.fa.fai candidate.clump.txt
09D-faiToBed candidate.clump.txt > candidate.clump.bed
Command exit status:
1
Command output:
(empty)
Command error:
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3 ✔ purrr 0.3.4
✔ tibble 3.1.2 ✔ dplyr 1.0.6
✔ tidyr 1.1.3 ✔ stringr 1.4.0
✔ readr 1.4.0 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: ‘reshape2’
The following object is masked from ‘package:tidyr’:
smiths
Error: arrange() failed at implicit mutate() step.
* Problem with `mutate()` column `..1`.
ℹ `..1 = V2`.
✖ object 'V2' not found
Backtrace:
█
1. ├─contigs %>% arrange(desc(V2))
2. ├─dplyr::arrange(., desc(V2))
3. ├─dplyr:::arrange.data.frame(., desc(V2))
4. │ └─dplyr:::arrange_rows(.data, dots)
5. │ ├─base::withCallingHandlers(...)
6. │ ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
7. │ └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
8. │ ├─dplyr::mutate(.data, !!!dots, .keep = "none")
9. │ └─dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
10. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
11. │ ├─base::withCallingHandlers(...)
12. │ └─mask$eval_all_mutate(quo)
13. ├─base::.handleSimpleError(...)
14. │ └─dplyr:::h(simpleError(msg, call))
15. │ └─rlang::abort(...)
16. │ └─rlang:::signal_abort(cnd)
17. │ └─base::signalCondition(cnd)
18. └─(function (cnd) ...
Execution halted
Work dir:
/storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
[poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489/.command.err
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3 ✔ purrr 0.3.4
✔ tibble 3.1.2 ✔ dplyr 1.0.6
✔ tidyr 1.1.3 ✔ stringr 1.4.0
✔ readr 1.4.0 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: ‘reshape2’
The following object is masked from ‘package:tidyr’:
smiths
Error: arrange() failed at implicit mutate() step.
* Problem with `mutate()` column `..1`.
ℹ `..1 = V2`.
✖ object 'V2' not found
Backtrace:
█
1. ├─contigs %>% arrange(desc(V2))
2. ├─dplyr::arrange(., desc(V2))
3. ├─dplyr:::arrange.data.frame(., desc(V2))
4. │ └─dplyr:::arrange_rows(.data, dots)
5. │ ├─base::withCallingHandlers(...)
6. │ ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
7. │ └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
8. │ ├─dplyr::mutate(.data, !!!dots, .keep = "none")
9. │ └─dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
10. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
11. │ ├─base::withCallingHandlers(...)
12. │ └─mask$eval_all_mutate(quo)
13. ├─base::.handleSimpleError(...)
14. │ └─dplyr:::h(simpleError(msg, call))
15. │ └─rlang::abort(...)
16. │ └─rlang:::signal_abort(cnd)
17. │ └─base::signalCondition(cnd)
18. └─(function (cnd) ...
Execution halted
[poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489/.command.log
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3 ✔ purrr 0.3.4
✔ tibble 3.1.2 ✔ dplyr 1.0.6
✔ tidyr 1.1.3 ✔ stringr 1.4.0
✔ readr 1.4.0 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: ‘reshape2’
The following object is masked from ‘package:tidyr’:
smiths
Error: arrange() failed at implicit mutate() step.
* Problem with `mutate()` column `..1`.
ℹ `..1 = V2`.
✖ object 'V2' not found
Backtrace:
█
1. ├─contigs %>% arrange(desc(V2))
2. ├─dplyr::arrange(., desc(V2))
3. ├─dplyr:::arrange.data.frame(., desc(V2))
4. │ └─dplyr:::arrange_rows(.data, dots)
5. │ ├─base::withCallingHandlers(...)
6. │ ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
7. │ └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
8. │ ├─dplyr::mutate(.data, !!!dots, .keep = "none")
9. │ └─dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
10. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
11. │ ├─base::withCallingHandlers(...)
12. │ └─mask$eval_all_mutate(quo)
13. ├─base::.handleSimpleError(...)
14. │ └─dplyr:::h(simpleError(msg, call))
15. │ └─rlang::abort(...)
16. │ └─rlang:::signal_abort(cnd)
17. │ └─base::signalCondition(cnd)
18. └─(function (cnd) ...
Execution halted
[poultrylab1@pbsnode01 get_non_ref_seq]$ cat /storage-02/zhaoqiangsen/pan_genome/get_non_ref_seq/work/60/2c4f95726ea41cab0f7ba9d5f56489/.command.out
The files alignments.blasttab
and candidate.fa.fai
are not empty.
Dear @npch @prenderj @RenzoTale88 @prasundutta87 Your code shows us a clear way to constrcut a pangenome. When I came to detect non-reference sequence, I can't find a help documentation. I don't understand the parameters in
CattleGraphGenomePaper/detectSequences/nf-GraphSeq/conf/params.config
. Could you please write a help documemtation for us or explain the parameters for us ?Thank you in adcance!
Best Johnson