ctlab / GADMA

Genetic Algorithm for Demographic Model Analysis
Other
47 stars 14 forks source link

Error running gadma #64

Closed SamWittwer closed 2 years ago

SamWittwer commented 2 years ago

Hello! Love the idea of the software but I'm struggling to get gadma running.

After getting an error with my own data, I've downloaded the data from Portik et al from their github here.

I've set up a params file following this and just adapting the population names (North and South instead of CVLN, CVLS)

I am getting the same error as with my own data: Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/gadma", line 8, in <module> sys.exit(main()) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/core/core.py", line 145, in main print_runs_summary(start_time, shared_dict, settings_storage) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/core/draw_and_generate_code.py", line 257, in print_runs_summary x_translated = engine.model.translate_values( File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/models/demographic_model.py", line 143, in translate_values tr_value = var.translate_value_into(units=units, File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/utils/variables.py", line 280, in translate_value_into raise ValueError("Set Nanc value for translation") ValueError: Set Nanc value for translation

Do you have any pointers for me to debug this? Thank you very much in advance for this very promising software!

noscode commented 2 years ago

Dear @SamWittwer,

Thank you very much for your interest and feedback! You are absolutely correct about that error: it is bug and it will be fixed in the nearest release. In essence, it happens when there is no Mutation rate and Sequence length set. GADMA does not understand that and tries to transform demographic values from genetic units to physical and therefore fails.

Right now you can fix it the following way: set Relative parameters to True in your params_file. Sorry for the inconvenience, it will be fixed soon! Please tell me how it will work.

SamWittwer commented 2 years ago

Dear Ekaterina

Thank you for getting back to me so quickly! I just finished a (very) small run and it appears to work now. Once I have figured out the nuances, GADMA will be a real gamechanger for me!

I have a few follow-up questions, sorry to be a bother.

noscode commented 2 years ago

Dear @SamWittwer,

Glad to hear that everything is working! Feel free to ask questions.

Sequence length in the parameter file would correspond to L in dadi?

Yes, you are absolutely right. That is the length of sequence that was sequenced and processed to get SNP's. There are some tips here.

Assuming I run the GADMA pipeline without sequence length and mutation rate and end up with genetic units, can I convert those back into real units following the parameter definitions in dadi using a mutation rate and L?

Yes, sure. Here is some documentation. Theta0 is equal to 4(mutation rate)(seq_length) and by default it is equal to 1 (i.e. units are genetic). When you want to change mutation rate or sequence length then you just follow those documentation about changing theta0.

After running GADMA and obtaining the 'best' model with its corresponding parameters, do those need an additional local optimization (e.g. via Portiks dadi_pipeline)?

No, GADMA launches global search firstly and then local search as well. So there is no need in additional local optimization after it.

SamWittwer commented 2 years ago

Hello @noscode

So far so good with the analysis but I'm running into issues with runaway parameters tracking their upper bounds with biologically unrealistic estimates (had a similar issue when running Portiks pipeline). One thing I'm trying right now would be different and more conservative parameter bounds but I'm not entirely sure based on the manual how to properly specify them.

Assuming I have a model starting at (1,1,1) up to (1,2,2) and I only want to change the upper bound for the T parameters, how should I specify this in the parameter file? I'm thinking: Lower bound: 0.001 Upper bound: 0.5 Parameter identifiers: T

but I am not sure if this will be correct

Cheers and happy weekend!

Sam

noscode commented 2 years ago

Hi @SamWittwer

I am glad to hear from you. Unfortunately sometimes methods go to the upper or lower bounds. In order to change bounds you need some additional settings that are located in the extra parameters file. Time is usually between values [1e-15, 5]. You can set bounds of time the following way:

min_T: 1e-15 max_T: 10 <- it will change the upper bound to 10.

This lines could be put in usual file with settings, no need in additional extra params file. Hope that will help!

Ekaterina