emmanuelparadis / psmcr

R Port of psmc
Other
12 stars 4 forks source link

compatibility with R #3

Closed selasphoruskershaw closed 3 years ago

selasphoruskershaw commented 3 years ago

Hi,

I've tried installing this package using R version 3.6 and 4.0, but for both I receive the error "package 'psmcr' is not available (for R version xxx). Which version of R should I have installed? Thanks.

emmanuelparadis commented 3 years ago

Hi,

psmcr is only on GH, not on CRAN. What command did you use?

emmanuelparadis commented 3 years ago

See also the previous question about installing from GH: https://github.com/emmanuelparadis/psmcr/issues/2#issuecomment-652791243

selasphoruskershaw commented 3 years ago

Thanks for your response. I have it successfully installed now, but now, when I use the VCF2DNAbin script, it tells me "mismatch between chromosomes", when I am sure that the reference I am attaching is the correct one for my data.

I tried a non-gzipped version of my reference: VCF2DNAbin("borneo1.vcf.gz", refgenome = "GCF_000247815.1_FicAlb1.5_genomic.fna") I tried a gzipped version of my reference: VCF2DNAbin("borneo1.vcf.gz", refgenome = "GCF_000247815.1_FicAlb1.5_genomic.fna.gz")

Looking forward to hearing from you.

Thanks, Brian

On Sat, Jan 23, 2021 at 11:37 PM Emmanuel Paradis notifications@github.com wrote:

See also the previous question about installing from GH: #2 (comment) https://github.com/emmanuelparadis/psmcr/issues/2#issuecomment-652791243

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-766305912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXDHNIOKL4FLEOX2M7DS3PE2LANCNFSM4WQFLU6Q .

selasphoruskershaw commented 3 years ago

Also, it seems important to add that I used the same reference and same individual to get psmc to work in terminal. I just had trouble getting bootstrapping to work properly which is why I am trying psmcr.

On Mon, Jan 25, 2021 at 9:43 AM Brian Myers bmyers62622@gmail.com wrote:

Thanks for your response. I have it successfully installed now, but now, when I use the VCF2DNAbin script, it tells me "mismatch between chromosomes", when I am sure that the reference I am attaching is the correct one for my data.

I tried a non-gzipped version of my reference: VCF2DNAbin("borneo1.vcf.gz", refgenome = "GCF_000247815.1_FicAlb1.5_genomic.fna") I tried a gzipped version of my reference: VCF2DNAbin("borneo1.vcf.gz", refgenome = "GCF_000247815.1_FicAlb1.5_genomic.fna.gz")

Looking forward to hearing from you.

Thanks, Brian

On Sat, Jan 23, 2021 at 11:37 PM Emmanuel Paradis < notifications@github.com> wrote:

See also the previous question about installing from GH: #2 (comment) https://github.com/emmanuelparadis/psmcr/issues/2#issuecomment-652791243

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-766305912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXDHNIOKL4FLEOX2M7DS3PE2LANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

Can you please paste here exactly the command you used and the output?

selasphoruskershaw commented 3 years ago

Here is the command I used in R Studio. The error message is in red. Thank you for your help.

library(psmcr) VCF2DNAbin("borneo1.vcf.gz", refgenome = "barn_noflycatch.fna") Scanning 211.194 MBError in VCF2DNAbin("borneo1.vcf.gz", refgenome = "barn_noflycatch.fna") : mismatch between chromosome names

On Tue, Jan 26, 2021 at 2:29 AM Emmanuel Paradis notifications@github.com wrote:

Can you please paste here exactly the command you used and the output?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-767451521, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXD6NP74GP25WLIJUJLS32KR7ANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

Thanks. There seems to be a mismatch between the chromosome names in both files. Can you try this code with your files?

library(pegas) # loads also ape
info <- VCFloci("borneo1.vcf.gz")
ref <- read.FASTA("barn_noflycatch.fna")
unique(info$CHROM)
names(ref)

The last two lines should give similar names, although not necessarily in the same order, so you may do also:

sort(unique(info$CHROM))
sort(names(ref))
selasphoruskershaw commented 3 years ago

Thanks for your response-sorry for my late reply-I've been waiting for an update to my dataset to finish.

I've now ensured that my reference genome and vcf have the same chromosome names, and the VCF2DNAbin command appears to run successfully, although I do get this message at the end: "Warning: In vol + nY : NAs produced by integer overflow".

A "ref" DNA bin is produced, however, the next command I use to try to make the psmc input file does not work. I think this is the command that is supposed to produce the input file for psmcR unless I am mistaken. Here's what I used, where "ref" is what was produced by the VCF2DNAbin command: "seqBinning("ref", bin.size=100)". Here is the error I received: "Error in seqBinning("ref", bin.size = 100) : 'ref' must be of class DNAbin.

I thought this was of the DNAbin class, so I am confused. Thanks again for your help-it seems things are getting closer!

On Wed, Jan 27, 2021 at 7:33 PM Emmanuel Paradis notifications@github.com wrote:

Thanks. There seems to be a mismatch between the chromosome names in both files. Can you try this code with your files?

library(pegas) # loads also apeinfo <- VCFloci("borneo1.vcf.gz")ref <- read.FASTA("barn_noflycatch.fna") unique(info$CHROM) names(ref)

The last two lines should give similar names, although not necessarily in the same order, so you may do also:

sort(unique(info$CHROM)) sort(names(ref))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-768775088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXD73NJAVX5Y6NDOZZLS4DLIRANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

About "Warning: In vol + nY : NAs produced by integer overflow": it seems you have have a big file, more than 2.1 GB (well not that big in today's norms... ). I'll arrange that but no worry: vol is used only to inform the user of the progress of the calculations.

About the error: maybe ref should not be quoted? Did you try:

seqBinning(ref, bin.size=100)

Especially if ref is the output of this command:

ref <- VCF2DNAbin(............
selasphoruskershaw commented 3 years ago

Hello,

I successfully got psmcr to run! Now I'm running into issues with the plotting function to visualize my results. If I want to visualize my results with a generation time of 1 and a mutation rate of 0.2e-8, what code should I use? I ran 20 iterations and 100 bootstraps if that is helpful.

Thanks again for your help!

On Thu, Feb 4, 2021 at 8:48 PM Emmanuel Paradis notifications@github.com wrote:

About "Warning: In vol + nY : NAs produced by integer overflow": it seems you have have a big file, more than 2.1 GB (well not that big in today's norms... ). I'll arrange that but no worry: vol is used only to inform the user of the progress of the calculations.

About the error: maybe ref should not be quoted? Did you try:

seqBinning(ref, bin.size=100)

Especially if ref is the output of this command:

ref <- VCF2DNAbin(............

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-773785173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXAJI3E2DHJ5OEHSV63S5N2ALANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

It seems you did the hardest part! Make sure you have the latest version of psmcr from GH for the plots (a bug with the scales was fixed about three months ago). Then try:

plot(output, mutation.rate = 0.2e-8)
plot(output, mutation.rate = 0.2e-8, scaled = TRUE)

If you modified the default value of bin.size in seqBinning(), you have to modify that in the above plot calls too.

selasphoruskershaw commented 3 years ago

Thanks for the quick reply!

I used the following code: plot("o", mutation.rate=0.2e-8)

Then I received the following error messages: "Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf 4: In plot.window(...) : "mutation.rate" is not a graphical parameter"

I scaled the bin.size to 100. Is that the issue? How should I correct my code? Thank you!

On Wed, Feb 17, 2021 at 8:39 PM Emmanuel Paradis notifications@github.com wrote:

It seems you did the hardest part! Make sure you have the latest version of psmcr from GH for the plots (a bug with the scales was fixed about three months ago). Then try:

plot(output, mutation.rate = 0.2e-8) plot(output, mutation.rate = 0.2e-8, scaled = TRUE)

If you modified the default value of bin.size in seqBinning(), you have to modify that in the above plot calls too.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781044587, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXFMICQ4MLZMFGDZHALS7SKZXANCNFSM4WQFLU6Q .

selasphoruskershaw commented 3 years ago

And just to verify, the default is a generation time of 1, so I don't have to input anything?

On Wed, Feb 17, 2021 at 8:47 PM Brian Myers bmyers62622@gmail.com wrote:

Thanks for the quick reply!

I used the following code: plot("o", mutation.rate=0.2e-8)

Then I received the following error messages: "Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf 4: In plot.window(...) : "mutation.rate" is not a graphical parameter"

I scaled the bin.size to 100. Is that the issue? How should I correct my code? Thank you!

On Wed, Feb 17, 2021 at 8:39 PM Emmanuel Paradis notifications@github.com wrote:

It seems you did the hardest part! Make sure you have the latest version of psmcr from GH for the plots (a bug with the scales was fixed about three months ago). Then try:

plot(output, mutation.rate = 0.2e-8) plot(output, mutation.rate = 0.2e-8, scaled = TRUE)

If you modified the default value of bin.size in seqBinning(), you have to modify that in the above plot calls too.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781044587, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXFMICQ4MLZMFGDZHALS7SKZXANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

I guess that should be plot(o) not plot("o").

selasphoruskershaw commented 3 years ago

When I tried plot(o) I received this message: "Error: C stack usage 7969904 is too close to the limit"

Is this an issue you've seen before?

On Wed, Feb 17, 2021 at 8:54 PM Emmanuel Paradis notifications@github.com wrote:

I guess that should be plot(o) not plot("o").

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781051411, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXBKQZ4XDPCVXVMRNMLS7SMPLANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

Can you check str(o)? It should be a (relatively long ) list with various components. Also class(o) should print "psmc".

selasphoruskershaw commented 3 years ago

class(o) did print "psmc"

Here's what str(o) output:

List of 12 $ niters : num 20 $ n : num 23 $ n_free_lambdas: int 7 $ logLik : num [1:21] 0 -3375322 -3369458 -3362839 -3355858 ... $ EMQ : num [1:21, 1:2] 0 -7204 -8401 -9749 -11197 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:2] "before" "after" $ RI : num [1:21] Inf 0.0374 0.0361 0.0359 0.0359 ... $ Cpi : num [1:21] 1 1.37 1.8 2.32 2.87 ... $ Nrecomb : num [1:21] 96594 116672 144671 185825 240557 ... $ parameters : num [1:21, 1:3] 0.000329 0.000257 0.000207 0.000174 0.000151 0.000133 0.000117 0.000101 0.000087 0.000075 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:3] "theta0" "rho0" "maxT" $ RS : num [1:483, 1:7] 0 1 2 3 4 5 6 7 8 9 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:7] "k" "t_k" "lambda_k" "pi_k" ... $ TC : num[0 , 1:4] $ bootstrap :List of 3 ..$ theta0: num [1:100] 1.2e-05 1.2e-05 1.2e-05 1.2e-05 1.2e-05 1.2e-05 1.2e-05 1.2e-05 1.2e-05 1.2e-05 ... ..$ tk : num [1:23, 1:100] 0 0.0409 0.0986 0.1799 0.2944 ... ..$ lk : num [1:23, 1:100] 2.54 2.54 2.54 2.54 12.34 ...

On Wed, Feb 17, 2021 at 9:04 PM Emmanuel Paradis notifications@github.com wrote:

Can you check str(o)? It should be a (relatively long ) list with various components. Also class(o) should print "psmc".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781056523, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXEF5YSF76ZHYHQSITLS7SNXRANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

This sounds good. Maybe your machine has reached some of its limits when you asked to do the plot. Maybe try gc() (twice) before calling plot.

selasphoruskershaw commented 3 years ago

What would that command look like? Something like this?

gc() gc() plot(o, mutation.rate=0.2e-8)

Also, I've attached my rds file output from psmcr. Can you see if you can create the plot, then I know it's a computational issue on my computer. Thanks again.

On Wed, Feb 17, 2021 at 9:18 PM Emmanuel Paradis notifications@github.com wrote:

This sounds good. Maybe your machine has reached some of its limits when you asked to do the plot. Maybe try gc() (twice) before calling plot.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781061888, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXGPS7XHCQKHM3IPPNDS7SPKTANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

Yes that looks like this.

Send me the .rds directly to me.

selasphoruskershaw commented 3 years ago

Hello,

Here's the .rds file. Please let me know if you replicate the error or are able to generate the plot. Thank you!

On Thu, Feb 18, 2021 at 3:02 AM Emmanuel Paradis notifications@github.com wrote:

Yes that looks like this.

Send me the .rds directly to me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781263369, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXBXPFQKHBKC2IMJZLTS7TXVFANCNFSM4WQFLU6Q .

selasphoruskershaw commented 3 years ago

Hi,

I was able to generate the plot! However, it looks strange compared to the plot for psmc. I've attached both plots generated below for the same individual. Why is the time scale different? I thought a limitation of psmc was that it cannot detect very recent events. A population crash for this population might make sense ~6000 years ago, but not 2000 years ago.

Also, is the y-axis in N x 10^4, and the x-axis in years or generations, like psmc?

Thanks again for your help-feels like we're nearly there!

[image: hr1_psmc_psmcr.PNG]

On Thu, Feb 18, 2021 at 3:02 AM Emmanuel Paradis notifications@github.com wrote:

Yes that looks like this.

Send me the .rds directly to me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-781263369, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXBXPFQKHBKC2IMJZLTS7TXVFANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

Still no attachment... maybe create a repository and upload the files there.

The plots should be similar between psmc and psmcr because the main computing code is the same. Make sure you have psmcr 0.1-2

The scales of the axes depend on whether you use scale = TRUE or not.

selasphoruskershaw commented 3 years ago

Okay, I uploaded my .rds file and the plot I output in psmc and psmcr for the same individual to Dropbox. I created shareable links and they are below.

I do have the correct version of psmcr, and I used both scale = TRUE and not and I still was getting a similar pattern (although different y (scaled time for scale = true) and x axis (theta for scale = true) values in the results as shown in the plot I posted on dropbox.

https://www.dropbox.com/s/rqy3ec9z827wmj7/hr1_psmc_psmcr.PNG?dl=0

https://www.dropbox.com/s/93ee97s3c1jlwf7/output_run_psmc%20%281%29.rds?dl=0

Thanks for your help

On Fri, Feb 19, 2021 at 8:19 PM Emmanuel Paradis notifications@github.com wrote:

Still no attachment... maybe create a repository and upload the files there.

The plots should be similar between psmc and psmcr because the main computing code is the same. Make sure you have psmcr 0.1-2

The scales of the axes depend on whether you use scale = TRUE or not.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emmanuelparadis/psmcr/issues/3#issuecomment-782556879, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZ3FXDEDW7QL4TQIT2ZARTS74Z67ANCNFSM4WQFLU6Q .

emmanuelparadis commented 3 years ago

Are you sure you run the two programs with exactly the same parameters, incluging binning, patterns ot temporal changes, and so on?

For the time scales, we did some analyses recently with some real data and the scales made sense. So I'm not sure what's going on here. But at least the two curves should have the same shape independently of the scales...

ashotmarg commented 3 years ago

Hi Emmanuel, thanks a lot for the cool package for running/plotting psmc. A small suggestion re plotting: unless I missed something, the plotting part (plot.psmc function) doesn't have an option "g" for generation in years, like the original "psmc_plot.pl" by Heng Li. I am guessing the default is currently "1 year" and one could manually recalculate the mutation.rate estimate to account for the generation time, right? In any case, I think it would be nicer with additional e.g. "g" parameter to make things more straightforward :) Thanks!

emmanuelparadis commented 3 years ago

Hi @ashotmarg2004, Thank you for your appreciation. I think you are correct and this answers one concern from @selasphoruskershaw. I've added the option g = 1 and this explanation in the help page:

The value given to mutation.rate may be either the mutation rate per year, in which case g may need to be changed, or the mutation rate per generation, in which case g may be left equal to one.

ashotmarg commented 3 years ago

Super, thanks for the prompt reply Emmanuel!