andreaminio / HaploSync

Tools for haplotype-wise reconstruction of pseudomolecules
19 stars 5 forks source link

Error Rscript #3

Closed hrpelg closed 2 years ago

hrpelg commented 2 years ago

Ciao,

I am getting the following error while running HaploSplit just using a reference for now. Could you advise that is the issue?

Command:

python HaploSplit.py -i $assembly -g $ref --align --haplodup --reuse_intermediate -o HaploSplit_ref_assembly

Error

Running command: Rscript -e 'library(rmarkdown) ; rmarkdown::render( "HaploSplit_ref_X2250.structure_comparison/index.rejected_sequences.html.Rmd" , knit_root_dir = "/path to/HaploSync" , output_dir = "/HaploSplit_ref_assembly.structure_comparison" , output_file = "index.rejected_sequences.html")' Traceback (most recent call last): File "HaploSplit.py", line 3456, in main() File "HaploSplit.py", line 3401, in main refID = hap2_to_ref[queryID] KeyError: ','

Cheers

andreaminio commented 2 years ago

Hi Elena,

It seems there is something weird going on with the ids of the reconstructed pseudomolecules...somehow one of them turned up to be a ",". I'm not sure on how it could happen, the tool is not supposed to use commas at any point in IDs. Let me check what is happening. Can you please confirm that you have 2 assembled haplotypes for each chromosome?

Andrea

andreaminio commented 2 years ago

Ok, I guess I found the bugged line. It should be fixed now. Please, give it a try again after pulling the update from the git.

Andrea

hrpelg commented 2 years ago

Thanks Andrea,

It seems the error was something to do with --haplodup. I re-run it using still the non upgradated script with

python HaploSplit.py -i $assembly -g $ref --align --reuse_intermediate --avoid_rejected_qc -o HaploSplit_ref

That worked all fine

I have a question: If I try now the fixed script with --haplodup, do I need to give it an annotation file? if that so, it has to be an annotation file from the reference genome or from the actual assembly?

Thanks for your early reply :-)

E

andreaminio commented 2 years ago

Hi Elena,

yes, it was an issue of HaploSync while producing the info required for HaploDup. It was looping on the wrong sequence ids. Now it should b e working correctly and allow to run HaploDup internally in automatic fashion.

As you are running HaploSync without genetic markers, I do strongly suggest to have some sort of annotation of the genes in the draft assembly. This will allow to spot more easily any chimera in the assembly. Eventually, if you haven't annotated yet the draft genome, a simple but pretty functional solution is to map the reference genome genes on the draft assembly (ex. with Gmap). Unique genes would do a great job as "markers" since you aspect to find them in single copy (or at least in low copy number) also in your draft.

I hope this helps,

Andrea

hrpelg commented 2 years ago

Awesome Andrea!

I am preparing my files from a genetic map done from a segregating pop where one of the parents (male plant) is the draft assembly too. So I will feed the pipeline with that too and add some genes from the draft assembly as you suggested.

I willl back to you once that is done

Thanks for the feedback

Cheers

E

hrpelg commented 2 years ago

Hi Andrea,

I have tried running HaploSlipt with the updated script.

My marker list file has teh marker order with values that are not just integers as I got cM values from a genetic map, so I changed the int(pos) to float(pos) at line 240 and 241. I hope that doenst interact with the rest of the script. However I get a further error:

` Mapping intermediate pseudomolecules to guide genome Hap1 - Chr01 Running command line: /software/bioinformatics/MUMmer-3.23/nucmer -p HaploSplit_ref_X2250.tmp/intermediate1_Chr01.on.Chr01 -t 20 --forward HaploSplit_ref_X2250.tmp/Chr01.fasta HaploSplit_ref_X2250.tmp/intermediate1_Chr01 Running command line: /software/bioinformatics/MUMmer-3.23/show-coords -c -l -r -T -H HaploSplit_ref_X2250.tmp/intermediate1_Chr01.on.Chr01.delta ERROR: Could not parse delta file, HaploSplit_ref_X2250.tmp/intermediate1_Chr01.on.Chr01.delta error no: 400

Reading Hits

Traceback (most recent call last): File "HaploSplit.py", line 3458, in main() File "HaploSplit.py", line 1769, in main intermediate_hap1_hits_tiling = used_hits_best_tiling_path(coords_file, fasta_len_dict)[( target_id , query_id )] KeyError: ('Chr01', 'intermediate1_Chr01') `

I could send you ther error and output files for more clarification if you need them

Thank you

Elena

hrpelg commented 2 years ago

Hi Andrea,

I think my issue is that I am using Mummer3 instead of v4 so it doesnt recognize -t as version 3 I think is not multi-thread...let's see if I can install version 4 and re-run it.

Cheers

E

hrpelg commented 2 years ago

Hi Andrea,

I run haploslipt just with the reference version, without adding any marker info by:

python HaploSplit.py -i $assembly -g $ref --align -c 20 -o HaploSplit_ref_X2250

This run perfectly.

However, when trying this:

python HaploSplit.py -i $assembly --markers marker_list.txt --map map.tsv -g $ref -l local.paf -c 20 -o HaploSplit_ref_X2250

I get the following error:

tig00007267|+: usable - 3 marker(s) - range [28:46] - makers [['Chr17', 28, '17_8234228', 'tig00007267', 809609, 809732], ['Chr17', 38, '17_8254567', 'tig00007267', 843204, 843330], ['Chr17', 46, '17_10379422', 'tig00007267', 2520632, 2520755]]

Used sequence (in order): tig00006949|-,tig00006983|.,tig00006726|.

Writing sequences

Updating sequence orientation using guide genome

Traceback (most recent call last): File "HaploSplit.py", line 3458, in main() File "HaploSplit.py", line 1631, in main query_fasta = query_fasta_db[query_id] KeyError: 'tig00009425'

-That contig has just one marker line but it is the starting conting of Chromosome 14

Is there something wrong with my input data or am I missing some optional commands? Once I get this running I will try to add haplodup and -gff3 together

Thanks for your help

E

hrpelg commented 2 years ago

Hi @andreaminio,

I wonder if my previous comment made sense and if you know why am I getting that error.

Many thanks

Elena

andreaminio commented 2 years ago

Hi Elena,

Sorry for the late response...kinda busy weeks, I'm trying to figure out why that specific contig is not found anymore in the pool of query sequences. Can you please share the full log and error files? I may need to track down where it was used and when it went lost.

A.

hrpelg commented 2 years ago

No worries

Thank you so much for your time. Much appreaciated I have sent you the files to your email

Cheers

E

hrpelg commented 2 years ago

Hi Andrea,

Did you have a chance to look at the error?

Cheers

E

andreaminio commented 2 years ago

Hi Elena,

I’m still waiting for the log files…I found nothing in my email nor even in the spam folder. Maybe I’ll need to share with you my e-mail address…

Let me see


Eng. Andrea Minio, PhD.

Bioinformatic Specialist | Department of Viticulture & Enology | Cantù Lab | UC Davis http://www.grapegenomics.com/

595 Hilgard Lane 2112 RMI North Building, Viticulture and Enology Department, UC Davis Davis, CA 95616

E-mail: @.**@.> Mobile: +1 (530) 304 0670 Lab: +1 (530) 752 0358

ResearchGate: https://www.researchgate.net/profile/Andrea_Minio LinkedIn: it.linkedin.com/in/andreaminio/enhttp://it.linkedin.com/in/andreaminio/en

From: Elena Lopez-Girona @.> Reply to: andreaminio/HaploSync @.> Date: Monday, 6 June 2022 at 16:33 To: andreaminio/HaploSync @.> Cc: Andrea Minio @.>, Mention @.***> Subject: Re: [andreaminio/HaploSync] Error Rscript (Issue #3)

Hi Andrea,

Did you have a chance to look at the error?

Cheers

E

— Reply to this email directly, view it on GitHubhttps://github.com/andreaminio/HaploSync/issues/3#issuecomment-1148035448, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIABKXELCZSYYLDR3C7WQ3LVN2DCTANCNFSM5VMPMH5Q. You are receiving this because you were mentioned.Message ID: @.***>

hrpelg commented 2 years ago

Hi Andrea,

Sorry, my bad...I thought I did.. I clearly need a holiday.. I hope you can see the files

Cheers

E haploSplit_ref_HOY2.out.log haploSplit_ref_HOY2.err.log

andreaminio commented 2 years ago

Hi Elena,

It seems it is not able to find anymore a sequence in the query database...which is weird since those info are only read by the code...so I edited the code to reload the sequences before starting to fix the orientation with the genome for draft sequences with one marker only. This should give no room for error to the code, please give it ago.

Andrea

hrpelg commented 2 years ago

Hi Andrea,

I get same issue. let me try one more time. I got first error due to the fact that my marker positions have floating numbers like for example 28.5 refering to the cM positions but your code has in line 240 and 241 int(pos) so I changed it for float(pos) and re-run it. Getting same error than before for that same scaffold. Might be the error related to that..? E

hrpelg commented 1 year ago

Hi Andrea,

No worries. Thank you so much for your time. Attached the log and error file

Cheers

Elena

Elena López-Girona Scientist

T: +64 6 953 7683 E: @.**@.> W: www.plantandfood.co.nzhttps://www.plantandfood.co.nz/ The New Zealand Institute for Plant and Food Research Limited

Postal Address: Plant & Food Research Private Bag 11600, Palmerston North 4442, New Zealand Physical Address: Plant & Food Research Food Industry Science Centre, Fitzherbert Science Centre, Batchelar Road, Palmerston North 4410, New Zealand

From: Andrea Minio @.> Sent: Tuesday, May 31, 2022 4:00 AM To: andreaminio/HaploSync @.> Cc: Elena López-Girona @.>; Author @.> Subject: Re: [andreaminio/HaploSync] Error Rscript (Issue #3)

Hi Elena,

Sorry for the late response...kinda busy weeks, I'm trying to figure out why that specific contig is not found anymore in the pool of query sequences. Can you please share the full log and error files? I may need to track down where it was used and when it went lost.

A.

— Reply to this email directly, view it on GitHubhttps://github.com/andreaminio/HaploSync/issues/3#issuecomment-1141307132, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH3D25L5G3I6GYIH3EHEWALVMTQY5ANCNFSM5VMPMH5Q. You are receiving this because you authored the thread.Message ID: @.**@.>>

The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.