diyabc / diyabcGUI

User interface to DIYABC/AbcRanger
https://diyabc.github.io/
Other
4 stars 2 forks source link

[training set simu][SNP IndSeq] interface should advise user for "Locus $$$ in pop $$ has only missing datas (value 9)" #86

Closed HudsonJamie closed 3 years ago

HudsonJamie commented 3 years ago

Good afternoon,

Thank you for this new iteration of DIYABC! Is it great to see this up and running.

I have a quick question regarding the number of loci used to produce our training set. In the manual, it is suggested that scenarios using 5000 to 20000 SNP loci is sufficient to obtain robust results. The IndSeq RAD dataset I am trying with has ~2000 neutral SNP loci, and when I attempt to simulate using this number of loci (i.e. the default number that appears after filtering for MAF) I am met with a "diyabc simu run exit status: -11" error in the shiny app. I am only able to get the simulations to run when I put the number of loci to something very low (such as 50), and then progressively do a sliding window approach (i.e. start at locus 1, then locus 51, then locus 101, etc.). I initially thought this is a memory allocation issue, however the small size of my data makes me question this. Do you think this is something with my input file (i.e. poor quality data) that could be causing diyabc-rf to fail when I try to simulate a "large" number of loci from my dataset, or is this something inbuilt which stops the user creating a training set on a too large proportion of the loci in the raw data? I have attached the log file that is produced incase this helps.

Kind regards, Jamie diyabc_run_call.log

gdurif commented 3 years ago

Hi Jamie,

Thanks for the report. There is no hard-coded limitation and it shouldn't be an issue to simulate ~2000 SNP loci.

For debugging purpose (and debugging purpose only), would you be open to share your data file (and the configuration you were trying to simulate)? so that we can replicate the issue.

If you prefer, we can communicate by email and you can share your data file through a safer and more confidentiality-friendly channel?

Best

HudsonJamie commented 3 years ago

Hi Ghislain,

Thanks for the speedy reply! In my simple attempt at debugging I managed to get my hands on a computer that could run the old DIYABC GUI and my .snp file brought up the error "Locus $$$ in pop $$ has only missing datas (value 9))". Removing all loci that were fully missing in any population has removed the "exit status: -11" error, and I'm able to run the diyabc-rf shiny app fine. I apologise that this was a simple mistake from my side! If you would like I can still email across a data file that fails with the error, though it should be simple to replicate now I think that we know what was causing the issue.

Many thanks, Jamie

gdurif commented 3 years ago

Thank you very much for the debugging. I forgot to manage this case, I will fix it in the next release with a warning in the interface.

gdurif commented 3 years ago

Comment:

à detecter quandd tu fais le parsing du observed dataset.Afficher un warning quand il y a un ou des locus avec missing data (que des '9' pour SNP; que des '000" ou '000000" pour microsats; que des <[]> ou <[]> <[]> pour sequences (voir section 7.1 datafile de la notice), ceci meme quand ces missing data touche une seule pop !!! (cf. on calcule les stats pour toutes les pops avec le meme nbre et les memes locus).

Bref, un message disant - "Warning: locus x in pop y contains only missing data; remove locus x from your dataset".

gdurif commented 3 years ago