corehunter / corehunter3

Core Hunter 3: a flexible core subset selection tool
http://www.corehunter.org
Apache License 2.0
6 stars 5 forks source link

Genotypes Input file error: "Marker matrix should be numeric (0, 1, 2)." #112

Open LivGilpin opened 2 years ago

LivGilpin commented 2 years ago

I get error messages that my input file is not numeric? I have 346 accessions and 13 105 SNPs in my dataframe. This is the script I used:

install.packages("corehunter")
library(corehunter)

#Load genotype file
Markers<-read.csv(file="GenotypesCoreHunter.csv",
                  header=TRUE,row.names=1,sep=";",
                  stringsAsFactors=FALSE,check.names=FALSE)

# bi-allelic data (e.g. SNP)
head(Markers,c(5,3))
        AFL2     CONS61   EV_Eve1_R422
35-1     1      1                2
35-10   1      1                2
35-13   0      1                2
35-16   1      1                1
35-22   1      0                2

I ran some test to check that the data frame was loaded into R as numeric:

sapply(Markers[,6000:7000],is.numeric) 
>[1] TRUE TRUE TRUE TRUE TRUE ...

geno.data <- genotypes(Markers, format = "biparental" )
>Error in genotypes(Markers, format = "biparental") : 
>Marker matrix should be numeric (0, 1, 2).

Since I still had problems with numeric, I tried to convert to matrix:

`MarkersMatrix <-as.matrix(Markers)`

When running genotypes function again, I still got the error message:
geno.data <- genotypes(MarkersMatrix, format = "biparental" )
#>Error in genotypes(Markers, format = "biparental") : 
#>Marker matrix should be numeric (0, 1, 2).

I tried the as.numeric function, but same error came up...

as.numeric(MarkersMatrix)
geno.data <- genotypes(MarkersMatrix, format = "biparental" )
>Error in genotypes(Markers, format = "biparental") : 
>Marker matrix should be numeric (0, 1, 2).
hdbeukel commented 2 years ago

Hi @LivGilpin,

Is it possible for you to share (part of) your data file so we can try it out? You can anonymise the data if needed.

LivGilpin commented 2 years ago

Dear Herman,

Thank you for you quick reply! It is very interesting to work with Core Hunter, I think it can become very useful for us.

I was able to find out the error with the input files, they had 3 instead of empty space as indicator of missing values.

However, new questions have arisen:

I am working with the Norwegian Apple Germplasm Collection and I am somewhat uncertain on how to read the results from CH3: sel EN.MR EN.GD 1 35_4 0.382448 0.062145 2 35_13 0.382448 0.062145 3 35_34 0.382448 0.062145 4 36_1 0.382448 0.062145 5 36_4 0.382448 0.062145 ….. … 66 50_10 0.382448 0.062145 67 50_16 0.382448 0.062145 68 50_28 0.382448 0.062145 69 50_31 0.382448 0.062145

As I have understood, CH3 constructs core collections with high diversity (high entry-to-nearest-entry distance; E-NE), and in this case, the default settings produced 69 accessions with high diversity.

However, the genetic diversity parameters commonly used in my field include: number of different alleles (Na), number of effective alleles (Ne), Shannon’s information index (I), observed heterozygosity (Ho), expected heterozygosity (He), unbiased expected heterozygosity (uHe) and inbreeding coefficient (F).

Is it possible to retrieve these parameters from CH3?

Best regards Liv Hatleli Gilpin PhD-student, fruit breeding Faculty of Biosciences Norwegian University of Life Sciences

@.***

From: Herman De Beukelaer @.> Sent: onsdag 3. august 2022 20:02 To: corehunter/corehunter3 @.> Cc: Liv Hatleli Gilpin @.>; Mention @.> Subject: Re: [corehunter/corehunter3] Genotypes Input file error: "Marker matrix should be numeric (0, 1, 2)." (Issue #112)

Hi @LivGilpinhttps://github.com/LivGilpin,

Is it possible for you to share (part of) your data file so we can try it out? You can anonymise the data if needed.

— Reply to this email directly, view it on GitHubhttps://github.com/corehunter/corehunter3/issues/112#issuecomment-1204299716, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2LGWZY5SQE5VOGISWSTABLVXKXYFANCNFSM55OOZPRA. You are receiving this because you were mentioned.Message ID: @.**@.>>

hdbeukel commented 2 years ago

Hi @LivGilpin

The size of the core collection is an input parameter for Core Hunter, you should set the desired core collection size when running sampleCore. It defaults to 20% of the full collection which is indeed 69 in your case. Core Hunter will select exactly the specified number of accessions trying to maximise the diversity of the selection of this size.

You can maximise a variety of genetic diversity measures, see http://www.corehunter.org/measures for more info. By default Core Hunter indeed maximises entry-to-nearest-entry distance, but if it makes more sense in your field to maximise e.g. Shannon's index or expected heterozygosity, these are available too. Use the obj parameter of sampleCore to change the evaluation measure. Some examples are provided in the documentation of sampleCore (try out ?sampleCore in your R terminal). You can even combine multiple measures into a weighed objective function if desired.

If you want to evaluate a core collection with another diversity measure than the one used to sample the core, you can use the function evaluateCore. It supports the same measures as those accepted by sampleCore.

LivGilpin commented 2 years ago

Thank you! I will test out different obj parameters.

From: Herman De Beukelaer @.> Sent: torsdag 4. august 2022 19:59 To: corehunter/corehunter3 @.> Cc: Liv Hatleli Gilpin @.>; Mention @.> Subject: Re: [corehunter/corehunter3] Genotypes Input file error: "Marker matrix should be numeric (0, 1, 2)." (Issue #112)

Hi @LivGilpinhttps://github.com/LivGilpin

The size of the core collection is an input parameter for Core Hunter, you should set the desired core collection size when running sampleCore. It defaults to 20% of the full collection which is indeed 69 in your case. Core Hunter will select exactly the specified number of accessions trying to maximise the diversity of the selection of this size.

You can maximise a variety of genetic diversity measures, see http://www.corehunter.org/measures for more info. By default Core Hunter indeed maximises entry-to-nearest-entry distance, but if it makes more sense in your field to maximise e.g. Shannon's index or expected heterozygosity, these are available too. Use the obj parameter of sampleCore to change the evaluation measure. Some examples are provided in the documentation of sampleCore (try out ?sampleCore in your R terminal). You can even combine multiple measures into a weighed objective function if desired.

If you want to evaluate a core collection with another diversity measure than the one used to sample the core, you can use the function evaluateCore. It supports the same measures as those accepted by sampleCore.

— Reply to this email directly, view it on GitHubhttps://github.com/corehunter/corehunter3/issues/112#issuecomment-1205591051, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2LGWZ27TODGHTPJ5LIDKXDVXQAEZANCNFSM55OOZPRA. You are receiving this because you were mentioned.Message ID: @.**@.>>

LivGilpin commented 2 years ago

Thank you for your help @hdbeukel.

Is there a way to produce reports for each locus, showing number of alleles, Ne, Ho, He, SI, … etc values per loci?

Liv

hermandebeukelaer commented 2 years ago

Core Hunter can't produce these detailed reports, only aggregates for the selected core collection. I'm sure there are other R packages that can do this for you, but I am not the expert there. Or you can write your own functions.

@daveneti can you share your thoughts on this?

daveneti commented 1 year ago

I would also need to look into it as well. I suggest you have a look at CRAN for some packages for these. There are few different ones, but I do not have a great deal of experience with them.

Guy

On 10 Aug 2022, at 03:11, Herman De Beukelaer @. @.> > wrote:

Core Hunter can't produce these detailed reports, only aggregates for the selected core collection. I'm sure there are other R packages that can do this for you, but I am not the expert there. Or you can write your own functions.

@daveneti can you share your thoughts on this?

— Reply to this email directly, view it on GitHub https://github.com/corehunter/corehunter3/issues/112#issuecomment-1209511945 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC32IIARSUJQVTVGMTPXULVYJYKXANCNFSM55OOZPRA . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AAC32IMJCZADAUAFE3LEGLLVYJYKXA5CNFSM55OOZPRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOJAL3ACI.gif Message ID: @.***>

LivGilpin commented 1 year ago

Hi.

I am trying to decide what genetic distance measure to use in our apple diversity germplasm. Based on the table below, which one would you recommend?

Also, for measure of diversity, would the HE or SH be most suited?

Allele coverage How to calculate?R? Diversity index: Expected proportion heterozygous loci Genetic distance: Modified Rogers Genetic distance: Cavalli-Sforza and Edwards Diversity Index: Shannon's Sampling strategy Subset code Subset size CV Number of alleles HE EN AN EE EN AN EE SH Whole collection WHC 344 1 0,365 0,266 0,000093 0,425 0,285 0,000093 0,446 10,024 Core Hunter single CHs69 69 1 0,382 0,404 0,241 0,452 0,426 0,259 0,471 10,040 CHs50 50 1 0,385 0,413 0,264 0,456 0,434 0,281 0,474 10,048 CHs40 40 1 0,388 0,419 0,277 0,458 0,439 0,295 0,477 10,051 CHs30 30 1 0,392 0,426 0,294 0,462 0,446 0,312 0,480 10,056 CHs20 20 1 0,398 0,435 0,313 0,466 0,455 0,333 0,484 10,062 CHs10 10 0,9997 0,406 0,453 0,341 0,475 0,472 0,362 0,492 10,071 Core Hunter multi (SH + EN using Cavalli S E) CHm69 69 0,386 10,042 CHm50 50 0,384 10,046 CHm40 40 0,394 10,046 CHm30 30 0,396 10,048 CHm20 20 0,443 10,037 CHm10 10 0,445 10,042 B/R Liv Gilpin


From: Herman De Beukelaer @.> Sent: Tuesday, August 9, 2022 17:11 To: corehunter/corehunter3 @.> Cc: Liv Hatleli Gilpin @.>; Mention @.> Subject: Re: [corehunter/corehunter3] Genotypes Input file error: "Marker matrix should be numeric (0, 1, 2)." (Issue #112)

Core Hunter can't produce these detailed reports, only aggregates for the selected core collection. I'm sure there are other R packages that can do this for you, but I am not the expert there. Or you can write your own functions.

@davenetihttps://github.com/daveneti can you share your thoughts on this?

— Reply to this email directly, view it on GitHubhttps://github.com/corehunter/corehunter3/issues/112#issuecomment-1209511945, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2LGWZZ3HAEGNYPI5HOYFU3VYJYKXANCNFSM55OOZPRA. You are receiving this because you were mentioned.Message ID: @.***>