AndersenLab / cegwas

Pipeline for performing GWAS mappings with C. elegans phenotype data
3 stars 3 forks source link

process_pheno issues #14

Closed synmuv closed 9 years ago

synmuv commented 9 years ago

I'm trying to run a trait with a bunch of old strains and replicate isotypes. Here are the errors:

First

Error in process_pheno(out) : Missing isotypes for the following strains: CB3197, CB3199, DR1350, JU262, JU362, JU395, MY12, MY14, MY17, MY20, MY3, MY5, PX176, PX178, TR389, TR403 No known isotype! Please remove strain(s).

This error is a good one, because it is detecting isotypes. Unfortunately, nearly all of these strains have known isotypes. How do we update the list?

Second Once I removed these "unknown" isotypes (just to be able to move on), then process_pheno threw the error:

Error: Duplicate identifiers for rows (2, 3), (23, 24), (27, 28), (29, 30), (31, 33), (32, 34), (48, 51), (49, 50), (56, 58)

Because some of the strains in the remaining list of strains are the same isotype. For example, AB2 and AB4. We need to be able to deal with this situation. What should we do?

Remove one of the strains per isotype? Average the phenotypes per isotype?

danielecook commented 9 years ago

Please provide this data you are using for mapping so I can test. There is a file in the data-raw folder that can be used to edit isotopes.

danielecook commented 9 years ago

We can allows the user to choose whether to choose one or average. By default I would suggest we average.

synmuv commented 9 years ago

I just sent data to you. Yes, I like the option to choose one or average and default as average. Could you also please add a message just to say that non-standard isotypes strains (x,y,z, etc.) were scored and that the following strain phenotypes were averaged?

danielecook commented 9 years ago

This issue should be addressed now. There is an option: remove_strains which can be used to remove strains that do not have isotypes automatically. Additionally, you can choose to average or take the first strain when their are multiple strains per isotype. Please test as I had to do quite a bit to get this to work.