Open lilypeck opened 6 months ago
Hello,
Thank you for your feedback.
48 hours is indeed abnormally long, especially for Sniffles. However, I just realized that Sniffles was always running with 3 threads. An update is available that allows you to choose the number of threads by increasing the THREADS
parameter in the config.yaml
file.
To get the new update:
git pull
#or
git clone https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git
However, "171991 lines" is very few. I think I have encountered this problem before, and I believe it was due to one of the versions of Singularity I had used. Changing the version solved the problem, but I am not too sure. I will try to identify the issue.
Best regards. M-D
Did you encounter the same problem with the test datasets?
M-D
Hi M-D
Sorry for my slow reply. The test datasets ran fine, I did not encounter the same problem. See attached .log file.
I am running tremolo through singularity, is it possible to update the .simg image?
Thank you
Lily
Hi @M-D75 Please do let me know if you identified the issue, or if the .simg image has also been updated? Thanks Lily
I am really really sorry for the delayed response.
On my end, the issue seems to potentially be hardware-related, but I am not sure. I ran the same analysis on a cluster on different compute nodes, and that problem it occurs on a few nodes generally identified as having old hardware. Do you run your analyses on a compute cluster? Have you tried running them on another node?
Again, I apologize for the delayed response. here's what I can do : I can modify the Singularity container by including a new version of Sniffles or, we could provide an option to skip Sniffles and only retain the extraction of INDELs indicated by the CIGAR in the alignment file, but it will take a few days to update the pipeline and perform tests.
Another question: did you build the container (the .simg image) yourself, or did you get it from this link?
"... is it possible to update the .simg image?"
What kind of update did you have in mind ? Sniffles update ?
Sorry again, M-D
Hi @M-D75
No problem, thank you for your reply.
Yes I run my analyses on a computing cluster. Each time I run it, the system automatically signs it to a node which 99% of the time is a different node to previously.
I downloaded the .simg image from the link. The update I referenced was the sniffles one you suggested in your original response. Happy with whichever option you think is best! I can have a go and let you know if it has worked or not?
Thank you for your help!
Lily
Hi,
An update is available. Simply replace CALL_SV: "sniffles"
with CALL_SV: "no_sniffles"
in your .yaml
configuration file. This will run the pipeline without the sniffles part.
Risk: Lower TE detection.
I hope this resolves your issue.
Do not hesitate to report any other issues. There are still other updates to come.
Best, M-D
Hi @M-D75
Thank you very much for your help.
I have re-run my script, but I am getting the following error message RE SVs, could this be caused by running without sniffles? (see output file for full details)
MissingInputException in line 2923 of /u/project/vlsork/ldpeck/tremolo/TrEMOLO/Snakefile:
Missing input files for rule TrEMOLO_SV_TE:
TrEMOLO_OUTPUT_barcode03/OUTSIDER/VARIANT_CALLING/SV.vcf
[SNK INFO] DRY RUN ERROR PIPELINE : please check your config file
Thank you!
Lily
Hi,
Sorry, first you need to download the update :
git pull
#or
git clone https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git
Then you can restart the pipeline from the new update. Let me know if it works.
Best, M-D
Hi @M-D75
Thank you, I have now updated both the .simg and the .git
However it is still getting stuck on calling SVs, it started on 2nd July and is still running today, but the .log hasn't updated since 3rd July.
My runscript was apptainer exec TrEMOLO.simg snakemake --snakefile TrEMOLO/run.snk --configfile scripts/barcode03_ragtag.yaml
See attached files. Any help would be much appreciated.
Thank you
Lily
Hi,
This is strange. I will consider other alternatives, as I am currently having difficulty identifying the problem. I will contact you again if any changes are made.
Sorry, M-D
Hi,
Sorry, I haven’t found a solution for now. It is difficult when we cannot reproduce the bug in question. I would like to have more information. Could you please send me the OUTSIDER/MAPPING/stats.txt
file from your analysis ?
Best, M-D
Hi M-D
Thank you very much. I have attached the stats.txt file.
Let me know.
Thanks
Lily

On 15 Jul 2024, at 01:56, M-D75 @.***> wrote:
Hi,
Sorry, I haven’t found a solution for now. It is difficult when we cannot reproduce the bug in question. I would like to have more information. Could you please send me the OUTSIDER/MAPPING/stats.txt file from your analysis ?
Best, M-D
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2228006067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAW4HSGOLYMFV4Q2IU5DZMOFDJAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGAYDMMBWG4. You are receiving this because you authored the thread.
raw total sequences: 4280038 filtered sequences: 0 sequences: 4280038 is sorted: 0 1st fragments: 4280038 last fragments: 0 reads mapped: 4279191 reads mapped and paired: 0 # paired-end technology bit set + both mates mapped reads unmapped: 847 reads properly paired: 0 # proper-pair bit set reads paired: 0 # paired-end technology bit set reads duplicated: 0 # PCR or optical duplicate bit set reads MQ0: 48492 # mapped and MQ=0 reads QC failed: 0 non-primary alignments: 4994972 total length: 38779490476 # ignores clipping total first fragment length: 38779490476 # ignores clipping total last fragment length: 0 # ignores clipping bases mapped: 38778690649 # ignores clipping bases mapped (cigar): 38511917654 # more accurate bases trimmed: 0 bases duplicated: 0 mismatches: 3155994396 # from NM fields error rate: 8.194851e-02 # mismatches / bases mapped (cigar) average length: 9060 average first fragment length: 9061 average last fragment length: 0 maximum length: 221426 maximum first fragment length: 0 maximum last fragment length: 0 average quality: 33.5 insert size average: 0.0 insert size standard deviation: 0.0 inward oriented pairs: 0 outward oriented pairs: 0 pairs with other orientation: 0 pairs on different chromosomes: 0 percentage of properly paired reads (%): 0.0
Hi,
38Gb of data isn't much for 48 hours
To avoid restarting everything, can you test this :
apptainer exec TrEMOLO.simg sniffles -t 15 --report-seq -s 1 -m /path/to/your/work_directory/OUTSIDER/MAPPING/SAMPLE_mapping_GENOME_MD.sorted.bam -v /path/to/your/OUTSIDER/VARIANT_CALLING/SV.vcf -n -1
-t 15
for 15 threads you can modify according to your capacity
replacing /path/to/your/work_directory/
accordingly.
I would like to know if the issue with the 48hours comes directly from the attempt to extract SVs on your data, or if it is something else like an outdated version of Snakemake. If it's the latter, I can update as many programs as necessary.
if it takes more than 24 hours, there's no point in continuing, given the amount of data, it should take less than 24 hours
thanks, M-D
Hi M-D
Thank you very much for your help.
I ran the below script and it runs for over 24 hours, see attached job log.
Let me know if there is something else I can try.
Thanks
Lily

On 21 Jul 2024, at 15:56, M-D75 @.***> wrote:
Hi,
38Gb of data isn't much for 48 hours
To avoid restarting everything, can you test this :
apptainer exec TrEMOLO.simg sniffles -t 15 --report-seq -s 1 -m /path/to/your/work_directory/OUTSIDER/MAPPING/SAMPLE_mapping_GENOME_MD.sorted.bam -v /path/to/your/OUTSIDER/VARIANT_CALLING/SV.vcf -n -1 -t 15 for 15 threads you can modify according to your capacity
replacing /path/to/your/work_directory/ accordingly.
I would like to know if the issue with the 48hours comes directly from the attempt to extract SVs on your data, or if it is something else like an outdated version of Snakemake. If it's the latter, I can update as many programs as necessary.
if it takes more than 24 hours, there's no point in continuing, given the amount of data, it should take less than 24 hours
thanks, M-D
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2241801557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAWY7RL6U3RFLBWLGUN3ZNQ4CFAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRHAYDCNJVG4. You are receiving this because you authored the thread.
Hi,
There will be an update this Tuesday, with a few small changes to package versions that I hope will resolve the issue.
Best, M-D
Hi,
Sorry i forgot to tell you.
Updated on a new branch.
Command to fetch the update:
git clone -b fix_issue_23 https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git
However, you should rebuild the Singularity (aptainer) container with this update.
sudo singularity build TrEMOLO.simg TrEMOLO/Singularity
some packages have been updated, in particular the package responsible for parsing the alignment file
you can then just try running the sniffles command as before :
apptainer exec TrEMOLO.simg sniffles -t 15 --report-seq -s 1 -m /path/to/your/work_directory/OUTSIDER/MAPPING/SAMPLE_mapping_GENOME_MD.sorted.bam -v /path/to/your/OUTSIDER/VARIANT_CALLING/SV.vcf -n -1
Best, M-D
Hi M-D
Thank you very much for your help. Could I check did you update the pre-compiled singularity container, I don’t have sudo rights as I’m using a server, so previously I downloaded the pre-compiled container. I have tried this update but it still takes >24 hours to complete.
Thanks
Lily
On 7 Aug 2024, at 05:22, M-D75 @.***> wrote:
Hi,
Sorry i forgot to tell you.
Updated on a new branch.
Command to fetch the update:
git clone -b fix_issue_23 https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git However, you should rebuild the Singularity (aptainer) container with this update.
sudo singularity build TrEMOLO.simg TrEMOLO/Singularity some packages have been updated, in particular the package responsible for parsing the alignment file
you can then just try running the sniffles command as before :
apptainer exec TrEMOLO.simg sniffles -t 15 --report-seq -s 1 -m /path/to/your/work_directory/OUTSIDER/MAPPING/SAMPLE_mapping_GENOME_MD.sorted.bam -v /path/to/your/OUTSIDER/VARIANT_CALLING/SV.vcf -n -1 Best, M-D
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2273343203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAWZAPBN4DL7RPFHAT6DZQIGORAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZTGM2DGMRQGM. You are receiving this because you authored the thread.
Hi,
Sorry, I'll send you a download link.
Sorry again, M-D
Hi @M-D75 I'm sorry for the delay, could you please re-send as the link has expired? Thanks Lily
Hi @M-D75
Unfortunately it is still not finishing in 24 hours, see attached files. Would it help if I shared a google drive folder containing my input files?
thanks
Lily
Hi,
Yes, it would help a lot if you could share the input files. I have a small idea of the potential problem. Having the input data would allow me to verify my hypothesis.
Thanks, M-D
Great thank you, are you able to share an email address please and I will send you a link to the folder?
Thanks
Lily
Yes, of course.
mourdas.mohamed[]ird.fr
Great, I have the shared the folder with your email
https://drive.google.com/drive/folders/14GMXEKuCr2k6BN4Pdrbl9O4NGYmFNlKE?usp=sharing
Thank you, I got it. I will keep you informed whether I have found the solution or not.
Hi,
I was able to check a few things. Sniffles takes about 5 days with 20 threads on your data. There is no blocking or bug as I initially thought; it's just very slow. So, in the end, it's mostly an optimization issue. I have a few ideas that I will test to improve the speed, but I am still unsure if this will be at the risk of losing some information and, if so, to what extent.
I haven't run the entire pipeline on your data yet, but i imagine it would take more than a week, which is far too long. However, by skipping some lengthy steps, except for Sniffles, I believe we could reduce the processing time to 5 days, albeit at the cost of a less comprehensive TE detection.
I will test different options, and if I get complete data, I will send it to you along with the fix so you can apply it to other datasets.
Thank you for your patience, M-D
Hi @M-D75
Thank you very much for your help! If possible, complete data + fix would be great. I will wait to hear from you.
Thanks
Lily
hi,
Just to let you know that I finally found the problem and was able to improve the speed on certain points, I'll let you know more once all the checks are complete.
Thanks for your patience, M-D
Hi,
I wanted to inform you that I was able to execute the pipeline on your data after modifying the code. The entire execution took about 14 hours using 20 CPUs. And the part that was lengthy and problematic took 4 hours.
I will send you a link to retrieve the output. A version of the tool will be provided with specific instructions, as it is not yet fully stable for all cases. I will continue to conduct tests and improve performance in certain areas.
Best, M-D
Hi,
That is great news, thank you very much
Thanks
Lily
On 24 Oct 2024, at 05:34, M-D75 @.***> wrote:
Hi,
I wanted to inform you that I was able to execute the pipeline on your data after modifying the code. The entire execution took about 14 hours using 20 CPUs. And the part that was lengthy and problematic took 4 hours.
I will send you a link to retrieve the output. A version of the tool will be provided with specific instructions, as it is not yet fully stable for all cases. I will continue to conduct tests and improve performance in certain areas.
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2435174032, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAW3AYG7AZDIFUAAQW3DZ5DSOXAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZVGE3TIMBTGI. You are receiving this because you authored the thread.
Hi,
Could you give me an e-mail address so that I can share the link. I'm having trouble uploading some of the data.
Best, M-D
Hi,
Thank you! It is ldpeck[at]ucla.edu http://ucla.edu/
Thanks
Lily
On 1 Nov 2024, at 15:18, M-D75 @.***> wrote:
Hi,
Could you give me an e-mail address so that I can share the link. I'm having trouble uploading some of the data.
Best, M-D
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2452663807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAW2SSZTXPCNPFI4UB3LZ6P44VAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJSGY3DGOBQG4. You are receiving this because you authored the thread.
Hi,
The main tests are complete. You can clone the fix_issue_23
branch as follows:
Using the command:
git clone https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git -b fix_issue_23
Or from your existing repository:
git pull
git checkout fix_issue_23
For the Singularity container, please use the previous version available on the Git repository.
Important notes:
A new parameter, TIME_LIMIT
, has been added to the configuration file. It specifies the maximum number of hours you are willing to allocate for the task of retrieving potential TE insertions. If the value is set to 0, there will be no time limit. However, with the modifications made to the pipeline, 4 hours were sufficient using 20 threads.
You can keep the value CALL_SV: no_sniffles
.
Do not enable CLIPPED_READS
(CLIPPED_READS
: True), as it would result in excessively long processing times for your data.
I hope this will work for you. I would be interested to know if it does. Thanks again for your help.
Thanks, M-D
Hi M-D
Thank you very much for your time with this fix.
I haven’t been able to download the files yet as I am travelling, once I’m back in the office I can download them.
I will let you know how I get on.
Thanks
Lily
On 13 Nov 2024, at 01:40, M-D75 @.***> wrote:
The main tests are complete. You can clone the fix_issue_23 branch as follows:
Using the command:
git clone https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git -b fix_issue_23 Or from your existing repository:
git pull git checkout fix_issue_23 For the Singularity container, please use the previous version available on the Git repository https://github.com/DrosophilaGenomeEvolution/TrEMOLO/releases/download/v2.5.4b/TrEMOLO.simg.
Important notes:
A new parameter, TIME_LIMIT, has been added to the configuration file. It specifies the maximum number of hours you are willing to allocate for the task of retrieving potential TE insertions. If the value is set to 0, there will be no time limit. However, with the modifications made to the pipeline, 4 hours were sufficient using 20 threads. You can keep the value CALL_SV: no_sniffles. Do not enable CLIPPED_READS (CLIPPED_READS: True), as it would result in excessively long processing times for your data.
I hope this will work for you. I would be interested to know if it does. Thanks again for your help.
Thanks, M-D
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2472996784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAW2CY7IGAVP5CSNMBFL2AMNBLAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSHE4TMNZYGQ. You are receiving this because you authored the thread.
Hi,
noted, I think the link has expired I'll generate another.
M-D
Hi M-D
I am now back in the office and able to download.
Thanks
Lily
On 13 Nov 2024, at 23:04, M-D75 @.***> wrote:
Hi,
noted, I think the link has expired I'll generate another.
M-D
— Reply to this email directly, view it on GitHub https://github.com/DrosophilaGenomeEvolution/TrEMOLO/issues/20#issuecomment-2475567145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3VAW4ALA4WLZHMCXSDY532ARDQVAVCNFSM6AAAAABII6VCGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZVGU3DOMJUGU. You are receiving this because you authored the thread.
Hi,
OK I'll prepare the links.
M-D
Hello
Thank you for this great tool.
I am trying to call TE insertions on a tree genome which is ~ 830 Mb.
My config.yaml file is as follows:
I run this file with a job script as follows:
Tremolo seems to be running okay, but it doesn't finish after 48 hours. When I re-run the script, first with
--unlock
and then with--rerun-incomplete
it says thatTrEMOLO_OUTPUT_barcode03/OUTSIDER/VARIANT_CALLING/SV.vcf
seems to be incomplete (see tremolo-run.sh.o3303866). How do I check if this file is complete, it has 171991 lines? The only way I can restart the script is to delete this file and start it again, in which case it doesn't finish again for 48 hours (see tremolo-run.sh.o3313962).Is it normal to take so long to run? I am mostly interested in TE insertions, so alternately is it possible to switch off calling SV's in case this speeds it up?
Thank you!
Lily
tremolo-run.sh.o3278150.txt tremolo-run.sh.o3313962.txt