DrosophilaGenomeEvolution / TrEMOLO

Transposable Elements MOvement detection using LOng reads
GNU General Public License v3.0
18 stars 5 forks source link

MAKE REPORT ERROR #18

Open zeruiWang opened 2 months ago

zeruiWang commented 2 months ago

Hi, Thank you for providing TrEMOLO! I got an error when running test data via singularity.

CHECKING R PACKAGES... MAKE REPORT Rscript -e "bookdown::render_book('index.Rmd', 'bookdown::gitbook')" 1/35
2/35 [init]
3/35
4/35 [get_library]
5/35
6/35 [PARAMETERS]
7/35
8/35 [COUNT_TE_INSIDER_INSERTION] 9/35
10/35 [COUNT_TE_INSIDER_DELETION] 11/35
12/35 [TSD_INSIDER]
13/35
14/35 [rm_INSIDER]
15/35
16/35 [MAPPING_STATS_OUTSIDER]
17/35
18/35 [show_STATS]
19/35
20/35 [COUNT_SV_OUTSIDER]
21/35
22/35 [NB_OUTSIDER]
23/35
24/35 [COUNT_TSD_OUTSIDER]
25/35
26/35 [TSD]
27/35
28/35 [rm_OUTSIDER]
29/35
30/35 [COUNT_TE_INOUTSIDER]
31/35
32/35 [NB_INOUTSIDER]
33/35
34/35 [rm_INOUTSIDER]
35/35
/usr/bin/pandoc +RTS -K512m -RTS mini_report.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output mini_report.html --lua-filter /usr/local/lib/R/site-library/bookdown/rmarkdown/lua/custom-environment.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/anchor-sections.lua --metadata-file /tmp/RtmpFlIiyr/file402207996d99 --wrap preserve --standalone --section-divs --table-of-contents --toc-depth 3 --template /usr/local/lib/R/site-library/bookdown/templates/gitbook.html --highlight-style pygments --number-sections --css style.css --mathjax --include-in-header /tmp/RtmpFlIiyr/rmarkdown-str4022014632dab.html BYE ! AN ERROR OCCURRED!

[SNK INFO] ERROR PIPELINE; snakefile used : /data/WangZerui/singularity/work_test/SNAKE_USED/Snakefile_outsider.snk Check LOG : /data/WangZerui/singularity/work_test/log/Snakefile_outsider.log Check ERROR : /data/WangZerui/singularity/work_test/log/Snakefile_outsider.err

Removing temporary output file /data/WangZerui/singularity/work_test/rep_tmp_snk. [Fri Apr 19 12:01:45 2024] Finished job 0. 1 of 1 steps (100%) done Complete log: /data/WangZerui/singularity/.snakemake/log/2024-04-19T115832.259098.snakemake.log

Snakefile_outsider.zip

I hope you can help me solve this problem! Thanks, Zerui

M-D75 commented 2 months ago

Hi,

Thank you for reporting the various issues. A new version should be available by Wednesday at the latest, addressing the two problems you've just mentioned, plus a few new features.

However, if you wish to retrieve the update before the completion of the ongoing tests, you can do so by cloning the corresponding branch:

git clone -b new-module-frequency-multi-generations https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git

You will need to rebuild the Singularity container, which has also been updated sudo singularity build TrEMOLO.simg TrEMOLO/Singularity.

I am still curious about the SAM TO DELTA error. Could you please show me the contents of the file log/pm_contigs_against_ref.sam.log and let me know if the file OUTSIDER/INSIDER_VR/pm_against_ref.sam is empty?

Thank you again for the reports, M-D

zeruiWang commented 2 months ago

Hi,

Thank you for responding to me,it's very helpful.

About the SAM TO DELTA error,I checked my config.yaml and found that the PRESET_OPTION is set to 'map-ont' while I am using mouse PacBio sequencing data. I corrected this error and reran TrEMOLO. The SAM TO DELTA error did not occur again, but the output was also overwritten,so I am unable to provide the log/pm_contigs_against_ref.sam.log.

I will try running the test data and my own data again. I hope it runs smoothly this time, but I also have a feeling that I may encounter some different errors. I hope that when the time comes, you'll still be able to guide me through resolving them.

Also, I'm using Ubuntu 22.04. I'm not sure if this will have any impact.

Thank you again for your help. Zerui

M-D75 commented 2 months ago

Thank you for this information.

Please note that the error you encountered with the test dataset does not prevent obtaining one of the main output files, TE_INFOS.bed. However, if you wish to resolve this issue, you can simply rebuild your Singularity container from the file located on the new-module-frequency-multi-generations branch. Alternatively, to avoid any problems, you can retrieve the precompiled Singularity file compatible with the current version you are using by clicking on this link.

Being on Ubuntu 22.04 should not cause any issues if you are using the Singularity container which loads Ubuntu 20.04. However, this version is still receiving updates until 2025. Therefore, if you have recently built the container, this may have caused some issues with the versions of R packages, as some are no longer available. This seems to be the source of the error you are encountering with the test set.

Please feel free to report any other issues, M-D

zeruiWang commented 2 months ago

Hi,

Thank you for your help.

I successfully completed the test run using the files new-module-frequency-multi-generations and the precompiled Singularity file you provided on the Ubuntu 20.04 system.

However, when I ran it on my own data, the following error occurred:

••• [SNK]--[Sun Apr 21 00:31:45 CST 2024] GET SV INSIDER •••

[Sun Apr 21 00:31:45 CST 2024] LOG TASK /data/tremolo/log/SV_INSIDER.log, /data/tremolo/log/SV_INSIDER.err MAPPING GENOME ON REFERENCE... AN ERROR OCCURRED! BYE !

The content of "SV_INSIDER.err: Traceback (most recent call last): File "/data/tremolo/TrEMOLO/lib/python/assemblitics/sam2delta.py", line 241, in write_delta(alns, sam_file + '.delta') File "/data/tremolo/TrEMOLO/lib/python/assemblitics/sam2delta.py", line 142, in write_delta f.write('>%s %s %r %r\n' % (aln[0], aln[1], ref_chr_lens[aln[0]], query_len)) KeyError: '15'

My assembled genome was generated using the SAMPLE file through flye assembly and my GENOME file is “Mus_musculus.GRCm38.dna.toplevel.fa“ from Ensembl.

Thanks, Zerui

M-D75 commented 2 months ago

Hi,

Thank for your reporting this issue. Can you show me the contents of the your_work_directory/log/pm_contigs_against_ref.sam.log file please ?

M-D

zeruiWang commented 2 months ago

Hi,

Certainly. Additionally, another error occurred in OUTSIDER. pm_contigs_against_ref.sam.log

[SNK]--[Sun Apr 21 20:37:07 CST 2024] FIND SV ON REF •••

ERROR : SAM TO DELTA ✘ ERROR : Assemblytics_uniq_anchor.py ✘ ERROR : filter_gap_SVs.py ✘ [SNK]--[Sun Apr 21 21:52:08 CST 2024]--[INFO] REFERENCE : Mus_musculus.GRCm38.dna.toplevel.fa [SNK]--[Sun Apr 21 21:52:08 CST 2024]--[INFO] PSEUDO GENOME : PSEUDO_GENOME_TE_DB_ID.fasta

AN ERROR OCCURRED!

[SNK INFO] ERROR PIPELINE; snakefile used : /data/tremolo/SNAKE_USED/Snakefile_outsider.snk Check LOG : /data/tremolo/log/Snakefile_outsider.log Check ERROR : /data/tremolo/log/Snakefile_outsider.err

Removing temporary output file /data/tremolo/rep_tmp_snk. [Sun Apr 21 21:52:18 2024] Finished job 0. 1 of 1 steps (100%) done Complete log: /data/.snakemake/log/2024-04-21T003143.321066.snakemake.log

The file OUTSIDER/INSIDER_VR/pm_against_ref.sam is empty.

Zerui

M-D75 commented 2 months ago

Please forgive me in advance if I am asking too much. Do you remember if the first time you encountered the ERROR: SAM TO DELTA, you also had a problem at the first stage of the pipeline ? Here is the error message:

[SNK]--[Sun Apr 21 00:31:45 CST 2024] GET SV INSIDER
•••

[Sun Apr 21 00:31:45 CST 2024] LOG TASK /data/tremolo/log/SV_INSIDER.log, /data/tremolo/log/SV_INSIDER.err
MAPPING GENOME ON REFERENCE...
AN ERROR OCCURRED!

Thank you for the pm_contigs_against_ref.sam.log file. It contains a warning: [WARNING] For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix. In your context, this requires the use of the --split-prefix option. I can add this, but I would like to know if the pm_against_ref.sam file was still properly generated. Could you run this command:

singularity exec TrEMOLO.simg samtools quickcheck your_work_directory/OUTSIDER/INSIDER_VR/pm_against_ref.sam

If you see no error messages, it means the file was correctly generated.

Additionally, you might use the TrEMOLO folder you may have retrieved from the new-module-frequency-multi-generations branch. This pipeline does not go through steps such as FIND SV ON REF or FIND TE ON REF, but through a different step called LIFT OFF. Of course, this is just an option, and you are not obliged to test it, knowing that we are still conducting tests to ensure everything works. It's up to you.

Thanks, M-D

zeruiWang commented 2 months ago

Yes, I remember.My data was obtained from PacBio CCS BAM files using bam2fastx. Could that be the reason?

I ran 'singularity exec TrEMOLO.simg samtools quickcheck your_work_directory/OUTSIDER/INSIDER_VR/pm_against_ref.sam' and received the message '/data/tremolo/OUTSIDER/INSIDER_VR/pm_against_ref.sam could not be opened for reading.'

I'm happy to test it. Also, I shouldn't need to add --split-prefix in the samtools PRESET_OPTION in config.yaml, right?

Thanks, Zerui

M-D75 commented 2 months ago

Oops, sorry, the pm_against_ref.sam file was probably deleted by Snakemake if it wasn't launched with the --keep-incomplete command since it is designated as an output file, which is why you are encountering this error. Sorry.

I now understand better how the data was generated, which may require two adjustments. The --split-prefix option is used for the minimap2 command in the config.yaml file, the minimap2 options only concern the GENOME vs SAMPLE (reads) part, but you could directly modify the TrEMOLO/rules.snk file at lines 1532 and 4499 (if it is from the main master branch). Add --split-prefix tmp_TrEMOLO and remove the --cs option for the reasons you just mentioned. The asm5 option should be changed to asm20 as the divergence with insertion and deletion errors might be quite high. Here is what it should look like after modification:

#at line 1532
#before 
#minimap2 -ax asm5 --cs -t {params.threads} {input.ref} {input.genome_real} > {output.sam} 2> {params.work_directory}/log/pm_contigs_against_ref.sam.log
#after
minimap2 -ax asm20 --split-prefix tmp_TrEMOLO -t {params.threads} {input.ref} {input.genome_real} > {output.sam} 2> {params.work_directory}/log/pm_contigs_against_ref.sam.log

#at line 4499
#before 
#minimap2 -ax asm5 --cs -t {params.threads} {input.ref} {input.genome} > {output.sam} 2> {params.work_directory}/log/pm_contigs_against_ref.sam.log
#after
minimap2 -ax asm20 --split-prefix tmp_TrEMOLO -t {params.threads} {input.ref} {input.genome} > {output.sam} 2> {params.work_directory}/log/pm_contigs_against_ref.sam.log

Once the file has been modified and saved, you can restart the pipeline, in the hope that this is where the problem actually lies.

Thank you for your time, M-D

zeruiWang commented 2 months ago

I tried both the master branch and the new-module-frequency-multi-generations branch and modified the TrEMOLO/rules.snk file . The TrEMOLO.simg I used is from https://github.com/DrosophilaGenomeEvolution/TrEMOLO/releases/download/v2.4.0/TrEMOLO.simg.The same INSIDER error happened again and I I shut down the program.

[Mon Apr 22 12:31:06 CST 2024] LOG TASK /data/tremolo/output/log/SV_INSIDER.log, /data/tremolo/output/log/SV_INSIDER.err MAPPING GENOME ON REFERENCE... AN ERROR OCCURRED!

pm_contigs_against_ref.sam.log: [M::mm_idx_gen::51.6131.70] collected minimizers [M::mm_idx_gen::64.1552.52] sorted minimizers [M::main::64.1552.52] loaded/built the index for 33 target sequence(s) [M::mm_mapopt_update::69.9762.39] mid_occ = 177 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 0; #seq: 33 [M::mm_idx_stat::75.213*2.29] distinct minimizers: 349851373 (93.30% are singletons); average occurrences: 1.391; average spacing: 8.398; total length: 4087919471

Zerui

M-D75 commented 2 months ago

Hmm,

Thank you for all your tests.

The INSIDER/VARIANT_CALLING/pm_against_ref.sam file in question may not be suitable for the sam2delta.py script, which is provided by Assemblitics. If possible, I would like you to send me the SAM file so I can study it; only a part of the file is necessary, as I am aware of the data confidentiality concerns.

For this, you will need to modify the TrEMOLO/run.snk file at line 320 and add the --keep-incomplete option to the snakemake command :

# Before at line 320
#snakemake --snakefile ${{path_to_pipline}}/Snakefile --configfile {params.name_configfile} 2>> {params.work_directory}/log/Snakefile_insider.err | tee -p {params.work_directory}/log/Snakefile_insider.log \
# After
snakemake --snakefile ${{path_to_pipline}}/Snakefile --configfile {params.name_configfile} --keep-incomplete 2>> {params.work_directory}/log/Snakefile_insider.err | tee -p {params.work_directory}/log/Snakefile_insider.log \

Once the error has occurred in the INSIDER part, you can stop the pipeline. The --keep-incomplete option will allow you to keep the output file INSIDER/VARIANT_CALLING/pm_against_ref.sam, which I would like to analyze. Even 5% of the file would be sufficient.

If you cannot provide the file, I will try another strategy.

Thank you again for your time, M-D

zeruiWang commented 2 months ago

Hi, Thank you for your help. I modified the TrEMOLO/run.snk file and find the output file INSIDER/VARIANT_CALLING/pm_against_ref.sam is empty. So,does this mean there is an issue with my REF and GENOME files? I'll check if there's an issue with my Flye settings and retry minimap2.

./minimap2 -ax map-pb ref.fa pacbio.fq.gz > aln.sam # PacBio CLR genomic reads ./minimap2 -ax map-ont ref.fa ont.fq.gz > aln.sam # Oxford Nanopore genomic reads ./minimap2 -ax map-hifi ref.fa pacbio-ccs.fq.gz > aln.sam # PacBio HiFi/CCS genomic reads (v2.19 or later)

Perhaps I need to use the map-hifi option? Zerui

M-D75 commented 2 months ago

😮 Interesting.

Yes, I think the problem might indeed originate from the REFERENCE or the GENOME. Have you checked if the FASTA format is properly adhered to?

Could you try a direct mapping like this:

singularity exec TrEMOLO.simg minimap2 -ax asm20 --split-prefix tmp_TrEMOLO -t 40 your_work_directory/INPUT/your_ref_fasta.fasta your_work_directory/INPUT/your_assembly_fasta.fasta > output.sam

What do you get? An error? Is the .sam file still empty?

Could you directly test from the symbolic links located in your_work_directory/INPUT? Because if, for one reason or another, the symbolic links were created incorrectly, that could explain the problem.

Regarding the options map-ont, map-hifi, etc., these are only for the OUTSIDER part (SAMPLE vs GENOME). It seems to me that in this part, you did not encounter any problems during the mapping stage, etc. The part where you are encountering the error concerns the INSIDER, which is the first stage of the pipeline where two .fasta files are mapped against each other.

Note : that you can launch the pipeline only for the OUTSIDER part, thus you will have information between your GENOME and your SAMPLE. For example, you can place your REF file in the GENOME variable and set INSIDER_VARIANT to false in the config.yaml file:

CHOICE:
    PIPELINE:
        OUTSIDER_VARIANT: True  # TE not assembled (out of genome)
        INSIDER_VARIANT: False
....

But of course, depending on the context of your study, you may lose some information.

M-D

zeruiWang commented 2 months ago

Hi, I tested my data using minimap2, as you suggested. Initially, I used 40 threads and found that the process was killed. So I realized that this might be due to excessive memory usage. After switching to 8 threads, I successfully generated the SAM file.I am currently running new-module-frequency-multi-generations branch using six threads. So far, there have been no errors.I will provide you with feedback promptly after the run is completed.

Thank you for your patient guidance! Zerui

zeruiWang commented 2 months ago

Hi, Unlucky again, another error has occurred.

••• [SNK]--[Wed Apr 24 07:18:15 CST 2024] LIFT_OFF •••

AN ERROR OCCURRED! BYE !

[SNK INFO] ERROR PIPELINE; snakefile used : /data/tremolo/output5/SNAKE_USED/Snakefile_outsider.snk Check LOG : /data/tremolo/output5/log/Snakefile_outsider.log Check ERROR : /data/tremolo/output5/log/Snakefile_outsider.err Removing temporary output file /data/tremolo/output5/rep_tmp_snk.

LIFT_OFF.err.log Snakefile_outsider.err.log

Sorry to bother you again. Zerui

M-D75 commented 2 months ago

Hi,

Thanks for your time,

In the LIFT_OFF.err.log file, there is this error:

/usr/bin/bash: line 23: liftoff: command not found

The Liftoff tool has recently been added to the new-module-frequency-multi-generations branch.

This is likely because you used the Singularity container generated by the master branch or a precompiled version. If you are using TrEMOLO from the new-module-frequency-multi-generations branch, you should also use the container from that branch. To check which container is the correct one, you can run this command:

singularity exec TrEMOLO.simg liftoff -h

If you do not get the liftoff: command not found error, then you are using the correct container.

Sorry again for this. M-D

M-D75 commented 2 months ago

Hi,

Here is a precompiled version of the new Singularity container if needed, the new version of the pipeline is now available on the master branch. Please remember that two parameters have been added to the config.yaml file.

Best regards, M-D

zeruiWang commented 2 months ago

Hi, Thank you for the update, but I still encountered some issues. Because I'm now using 6 threads, it's running slower, and the errors didn't show up until yesterday afternoon.

[Thu Apr 25 01:33:29 CST 2024] LOG TASK /data/tremolo/output5/log/LIFT_OFF.out, /data/tremolo/output5/log/LIFT_OFF.err

••• [SNK]--[Thu Apr 25 01:33:29 CST 2024] LIFT_OFF •••

extracting features BYE ! AN ERROR OCCURRED!

[SNK INFO] ERROR PIPELINE; snakefile used : /data/tremolo/output5/SNAKE_USED/Snakefile_outsider.snk Check LOG : /data/tremolo/output5/log/Snakefile_outsider.log Check ERROR : /data/tremolo/output5/log/Snakefile_outsider.err

Removing temporary output file /data/tremolo/output5/rep_tmp_snk. [Fri Apr 26 16:07:27 2024] Finished job 0. 1 of 1 steps (100%) done Complete log: /data/tremolo/.snakemake/log/2024-04-24T120334.456848.snakemake.log

LIFT_OFF.err.log Snakefile_outsider.err.log

I checked the LIFT_OFF.err file, but I couldn't find where the problem lies. I'm unable to resolve it on my own, so I apologize for the inconvenience, but I need to ask for your help again. Thanks, Zerui

M-D75 commented 2 months ago

Hi,

Thank you for this information. At the moment, I don't really have an idea about the source of the problem. I think I will update to include more information in the log files so that I can identify the issue.

I will let you know once the update has been made. I hope you will be able to run it so that I can find a solution.

I do have one question, though: did the SV_INSIDER part go well?

M-D

zeruiWang commented 2 months ago

Hi, Yes, the SV_INSIDER part has been successfully completed. Thank you very much for your assistance. Zerui

M-D75 commented 2 months ago

Hi,

the update is available, on main branch master.

M-D

zeruiWang commented 2 months ago

Hi,

I updated and tested again,but the same error still happened. LIFT_OFF.err.log Snakefile_outsider.err.log

Zerui

M-D75 commented 2 months ago

Hi,

Thank you again for all your tests.

/usr/bin/bash: line 37: 3745547 Killed                  liftoff -p 20 -g /data/tremolo/output1/OUTSIDER/INSIDER_VR/INOUTSIDER.gff ...

I think the problem remains the same as the one you mentioned above : the system could kill the process due to insufficient resources. Could you run the tool on a node of cluster allocating much more RAM?

COUNT: 57858 /data/tremolo/output1/OUTSIDER/INSIDER_VR/INOUTSIDER.gff
...
... Peak RSS: 31.461 GB

Indeed, you have a quite high amount of transposable elements to process, which consumes a considerable amount of resources. I believe that the added amount of transposable elements significantly increases the genome size, which requires more resources during mapping. However, the mapping step seems to be going well; Liftoff must consume more RAM after this first stage.

M-D

zeruiWang commented 2 months ago

Okay, I will adjust my TE data and try again. Meanwhile, I have an idea - can I simultaneously use the assembly data of WT mice and the sample data of Hom mice to discover transposon differences between individual Hom and WT mice?

Zerui

M-D75 commented 2 months ago

Sorry, I conducted a resource consumption check on the Liftoff program and it turns out that only the mapping part of this program is resource-intensive. Thus, I indeed have difficulty assessing the problem.

Could you try running the command directly with 20 threads or more?

singularity run TrEMOLO.simg liftoff -p 20 -g /data/tremolo/output1/OUTSIDER/INSIDER_VR/INOUTSIDER.gff -f /data/tremolo/output1/OUTSIDER/INSIDER_VR/feature.txt -o /data/tremolo/output1/OUTSIDER/INSIDER_VR/output_INOUT.gff3 /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa /data/tremolo/output1/OUTSIDER/TE_TOWARD_GENOME/PSEUDO_GENOME_TE_DB_ID.fasta
M-D75 commented 2 months ago

Okay, I will adjust my TE data and try again. Meanwhile, I have an idea - can I simultaneously use the assembly data of WT mice and the sample data of Hom mice to discover transposon differences between individual Hom and WT mice?

Zerui

Do you mean to use multiple assemblies at the same time, against sampling? Unfortunately, this is not directly possible, but it's an interesting idea that we could implement in future updates. I will keep your idea in mind.

Indirectly, you could run TrEMOLO multiple times, each time setting a different assembly as the GENOME parameter and your reads as SAMPLE, without setting a REFERENCE, and by setting the INSIDER_VARIANT: False parameter. However, your assemblies probably diverge from each other, which poses a problem for annotation. You would need to conduct a post-TrEMOLO analysis to compare common positions across the different assemblies, for example using LiftOFF :).

unless your goal is just to have the differences, then yes.

M-D

zeruiWang commented 2 months ago

Hi, I ran the command and ERROR happened.

(base) root@dell-virtual-machine:/data/tremolo# singularity run TrEMOLO.simg liftoff -p 20 -g /data/tremolo/output1/OUTSIDER/INSIDER_VR/INOUTSIDER.gff -f /data/tremolo/output1/OUTSIDER/INSIDER_VR/feature.txt -o /data/tremolo/output1/OUTSIDER/INSIDER_VR/output_INOUT.gff3 /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa /data/tremolo/output1/OUTSIDER/TE_TOWARD_GENOME/PSEUDO_GENOME_TE_DB_ID.fasta extracting features 2024-05-04 18:29:10,612 - INFO - Populating features 2024-05-04 18:29:14,090 - INFO - Populating features table and first-order relations: 57883 features 2024-05-04 18:29:14,090 - INFO - Updating relations 2024-05-04 18:29:14,362 - INFO - Creating relations(parent) index 2024-05-04 18:29:14,362 - INFO - Creating relations(child) index 2024-05-04 18:29:14,362 - INFO - Creating features(featuretype) index 2024-05-04 18:29:14,377 - INFO - Creating features (seqid, start, end) index 2024-05-04 18:29:14,402 - INFO - Creating features (seqid, start, end, strand) index 2024-05-04 18:29:14,426 - INFO - Running ANALYZE features Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/pyfaidx/init.py", line 358, in init self.file = self._fasta_opener(filename, 'r+b' FileNotFoundError: [Errno 2] No such file or directory: '/data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/liftoff", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/liftoff/run_liftoff.py", line 12, in main run_all_liftoff_steps(args) File "/usr/local/lib/python3.8/dist-packages/liftoff/run_liftoff.py", line 24, in run_all_liftoff_steps feature_db, feature_hierarchy, ref_parent_order = liftover_types.lift_original_annotation(ref_chroms, target_chroms, File "/usr/local/lib/python3.8/dist-packages/liftoff/liftover_types.py", line 15, in lift_original_annotation align_and_lift_features(ref_chroms, target_chroms, args, feature_hierarchy, liftover_type, unmapped_features, File "/usr/local/lib/python3.8/dist-packages/liftoff/liftover_types.py", line 23, in align_and_lift_features aligned_segments= align_features.align_features_to_target(ref_chroms, target_chroms, args, File "/usr/local/lib/python3.8/dist-packages/liftoff/align_features.py", line 16, in align_features_to_target target_fasta_dict = split_target_sequence(target_chroms, args.target, args.dir) File "/usr/local/lib/python3.8/dist-packages/liftoff/align_features.py", line 32, in split_target_sequence Faidx(target_fasta_name) File "/usr/local/lib/python3.8/dist-packages/pyfaidx/init.py", line 367, in init raise FastaNotFoundError( pyfaidx.FastaNotFoundError: Cannot read FASTA file /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa

It's strange because the file is actually here.

(base) root@dell-virtual-machine:/data/tremolo# ls /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa

Zerui

M-D75 commented 2 months ago

Okay, I will adjust my TE data and try again. Meanwhile, I have an idea - can I simultaneously use the assembly data of WT mice and the sample data of Hom mice to discover transposon differences between individual Hom and WT mice?

Zerui

I think you can turn to this tool to do that : https://github.com/cgroza/GraffiTE

M-D

M-D75 commented 2 months ago

It's strange because the file is actually here.

(base) root@dell-virtual-machine:/data/tremolo# ls /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa /data/tremolo/output1/INPUT/Mus_musculus.GRCm38.dna.toplevel.fa

Zerui

Hi,

I think the problem stems from the fact that there are symbolic links in the INPUT folder. It's possible that the symbolic link was incorrectly created or that it fails to trace back to the original file. Could you try again by directly specifying the original file's path? or specify the option -B : singularity run -B /data TrEMOLO.simg liftoff -p 20 -g ...

M-D