cgroza / GraffiTE

GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Other
121 stars 6 forks source link

3_TSD_search output folder missing #24

Open Wicker-Lab opened 7 months ago

Wicker-Lab commented 7 months ago

Hello,

First of all: thank you very much for the pipeline.

I ran the command like this:

nextflow run cgroza/GraffiTE --assemblies assemblies.csv --TE_library nrTREP20 --reference Bgt_genome_v3_16 --graph_method pangenie --genotype false

To detect SVs in 10 fully assembled genomes (so I didn't add any reads for mapping so far).

As far as I understood, the 3rd output folder TSD search and especially the pangenome.vcf should still be written, or not?

I only get: [rest_of_path]/out$ ls 1_SV_search 2_Repeat_Filtering

The output files in these folder look "normal" as far as I can tell. For example the file "indels.fa.masked" has many sequences, of which most are at least partially repeat masked.

However the file: genotypes_repmasked_filtered.vcf somehow has no variants, even if there are many in the per sample vcfs.

Also: there is not error message at the end of the run:

executor > local (21) [d3/6be117] process > map_asm (5) [100%] 9 of 9 ✔ [a0/d40a97] process > svim_asm (9) [100%] 9 of 9 ✔ [32/f5dfe6] process > survivor_merge [100%] 1 of 1 ✔ [9c/720983] process > repeatmask_VCF (1) [100%] 1 of 1 ✔ [4f/1f4338] process > tsd_prep (1) [100%] 1 of 1 ✔ [- ] process > tsd_search - [- ] process > tsd_report - Completed at: 25-Apr-2024 11:58:28 Duration : 26m 27s CPU hours : 2.3 Succeeded : 21

Thank you very much for your help!

cgroza commented 7 months ago

Hi

It's a bit difficult for me to track the bug without more information. It seems some step in annotating the VCF with repeats fails.

Could you share svim-asm_variants.vcf from out/1_SV_search, the TE library and the reference genome?

xxYaaoo commented 6 months ago

Hi~

I met the same question. My task showed 'complete' state, but the output directory only contained [1_SV_search 2_Repeat_Filtering] and there was not error message in my task log:

executor > local (43) [46/b0b3f5] process > map_longreads (18) [100%] 20 of 20 ✔ [41/029f00] process > sniffles_sample_call (20) [100%] 20 of 20 ✔ [68/fa7e45] process > sniffles_population_call (1) [100%] 1 of 1 ✔ [5a/63a7ce] process > repeatmask_VCF (1) [100%] 1 of 1 ✔ [f7/095c3b] process > tsd_prep (1) [100%] 1 of 1 ✔ [- ] process > tsd_search - [- ] process > tsd_report - [- ] process > make_graph - [- ] process > graph_align_reads - [- ] process > vg_call - [- ] process > merge_VCFs - Completed at: 08-May-2024 19:04:10 Duration : 2d 8h 9m 43s CPU hours : 2'695.2 Succeeded : 43

I use the GRCh38 as my reference genome and here are my sniffles2_variants.vcf in [out/1_SV_search] and TE library: sniffles2_variants.xlsx ERV_element_seq_library_0318.txt

Any idea to solve this problem~ Thank you for your help!

clemgoub commented 6 months ago

Hello @Wicker-Lab and @xxYaaoo,

I'm sorry I missed both your messages.

This error previously happened with me due to an issue with the default /tmp dir not being accessible. During the annotation process, there are several calls made to create a tmp dir. In some machines, the default /tmp dir space is too small. A typical output is that the folder "2_Repeat_Filtering" is created, but there are no variants in the VCF "genotypes_repmasked_filtered". To check if this is the case, you can look the Nextflow logs for this specific process. They should be located in work/9c/720983*/.command.out or work/9c/720983*/.command.err for you @Wicker-Lab and in work/5a/63a7ce*/.command.out or work/5a/63a7ce*/.command.err for you @xxYaaoo. Look for something of the sort mktemp: failed to create directory [...]: No space left on device or unable to create /x/y/z: No space left on device (or similar). If that's the case, look here. If you can't find anything related to a /tmp directory issue, send us here the complete logs and we will figure out what's wrong!

another possibility might be the error discussed here: https://github.com/cgroza/GraffiTE/issues/8

Sorry again for the delays, I'm looking forward to get you out of trouble!

Cheers,

Clément