Closed pjotrp closed 5 years ago
note that you can increase debugging output with
env LOG_LEVEL=DEBUG ./bin/genenetwork2 ~/gn2_settings.py
I'm guessing this only occurs with the small database; with the full one R/qtl runs fine with the old genotypes and throws the same memory error as the other two mapping methods with the new genotypes (not the error you seem to be getting).
I've gotten errors similar to that in the past and it usually is related to the number of samples/strains not matching what it sees in the genotype file.
On Sun, Sep 25, 2016 at 5:25 AM, Pjotr Prins notifications@github.com wrote:
note that you can increase debugging output with
env LOG_LEVEL=DEBUG ./bin/genenetwork2 ~/gn2_settings.py
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-249414052, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmM0LqouDItavj5IzcQXzX94Mqd5Tks5qtkv_gaJpZM4KF3CD .
Yes, pylmm and reaper run fine on this. OK, leave it open, I'll take a look when you are done splitting the files.
Created a separate issue for the error when loading large geno files: https://github.com/genenetwork/genenetwork2/issues/190
It works again in the browser.
And now it does not work. When I changed to the latest genotype files R/qtl broke again with that same message. Could it be that BXD.json is out of date? Why are we using two genotype formats anyway?
http://test-gn2.genenetwork.org/show_trait?trait_id=1432048_at&dataset=HC_M2_0606_P
When running R/qtl the log says:
198 individuals
3811 markers
2 phenotypes
--Cross type: f2
covnames (purged): rs6405415
No covariates
INFO:utility.benchmark:.__exit__: Total time in MarkerRegression took: 5.115181 seconds
INFO:wqflask.marker_regression.marker_regression_gn1:.__init__: Running qtlreaper
INFO:utility.tools:Found: file /home/pjotr/gn2_data/genotype/BXD.geno
reaper: parsing /home/pjotr/gn2_data/genotype/BXD.geno
reaper: done parsing
ERROR:wqflask.views:.handle_bad_request: 11:47:38 UTC 20161002: list index out of range
ERROR:wqflask.views:.handle_bad_request: 11:47:38 UTC 20161002: u'http://test-gn2.genenetwork.org/marker_regression'
ERROR:wqflask.views:.handle_bad_request: 11:47:38 UTC 20161002: Traceback (most recent call last):
File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/views.py", line 516, in marker_regression_page
gn1_template_vars = marker_regression_gn1.MarkerRegression(result).__dict__
File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 561, in __init__
gifmap = self.plotIntMapping(intCanvas, startMb = self.startMb, endMb = self.endMb, showLocusForm= showLocusForm)
File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 836, in plotIntMapping
self.drawQTL(canvas, drawAreaHeight, gifmap, plotXScale, offset=newoffset, zoom= zoom, startMb=startMb, endMb = endMb)
File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 2031, in drawQTL
canvas.drawPolygon(LRSCoordXY,edgeColor=thisLRSColor,closed=0, edgeWidth=lrsEdgeWidth, clipX=(xLeftOffset, xLeftOffset + plotWidth))
File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/piddle-1.0.15gn-py2.7.egg/piddle/piddlePIL.py", line 377, in drawPolygon
if (closed or (pts[0][0]==pts[-1][0] and pts[0][1]==pts[-1][1])) \
IndexError: list index out of range
You can see /home/pjotr/gn2_data/genotype/BXD.geno is loaded (this is on Penguin2)
1630168 Sep 23 12:29 /home/pjotr/gn2_data/genotype/BXD.geno
We had some mapping method (or methods) that didn't automatically read the ".geno" files, so we converted them to JSON because it was easy to convert to and from python.
I think that maybe it was PYLMM and you changed this when you improved it, though? I just did a grep and can't find anywhere that uses them (other than one place in show_trait that isn't necessary if we're not using them), so maybe they can be removed now?
On Sat, Oct 1, 2016 at 2:16 AM, Pjotr Prins notifications@github.com wrote:
And now it does not work. When I changed to the latest genotype files R/qtl broke again with that same message. Could it be that BXD.json is out of date? Why are we using two genotype formats anyway?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-250897608, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmElrjqopZhRkzrxm9BkFBNTJsgLnks5qvgjOgaJpZM4KF3CD .
Yeah, we should remove the JSON files - makes no sense to have duplicate data. But the issue here is different. R/qtl is broken at the moment. Not sure why.
Fixed - R/qtl scanone is fine on staging.
Created a different issue for JSON files in https://github.com/genenetwork/genenetwork2/issues/202
Actually I hit the piddle error again this morning. Looks like this happens rarely and I have not been able to reprdoduce it later. Reopening issue because I think it is a state problem. @zsloan What does above code actually do that we can get an IndexError?
It appears to be column binding the cross object and the phenotypes (which are passed in as a string). I imagine the error is due to there being a mismatch between the number of samples/strains in the cross object and the number of phenotypes. The R/qtl line (with variable contents included) would end up looking like this:
the_cross$pheno <- cbind(pull.pheno(the_cross), the_pheno = c(14.129,14.166,14.110,14.098,14.232,14.000,14.270,14.188,14.204,NA,13.923,13.939,NA,13.836,13 .957,14.073,14.011,14.060,14.326,NA,NA,14.154,14.184,13.897,NA,13.984,14.408,14.056,14.058,NA,NA,NA,14.096,14.059,13.964,NA,14.064,14.007,14.262,14.106,13 .900,13.939,NA,14.087,13.707,NA,NA,NA,14.326,NA,NA,14.224,14.259,14.192,13.954,14.136,13.956,14.180,14.058,14.015,14.028,14.153,14.326,13.922,NA,NA,14.236 ,14.053,NA,14.155,13.846,14.060,14.037,NA,14.065,NA,14.222,14.108,14.043,14.410,13.986,NA,13.936,13.946,NA,14.125,13.994,NA,13.866,14.336,NA,NA,NA,NA,NA,N A,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA ,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
Have you only sometimes noticed this error for the same trait, or do you mean that the error has only occurred with certain traits (but consistently occurs with those traits)? The latter might make sense, because the inputs are the cross object (which should be the same for all traits from that group) and the phenotypes (which will be different for each trait).
Unfortunately I probably can't help with fixing the code itself, since Danny wrote all the R code.
Yeah, this error comes and goes on one trait. That is why I can not reproduce it (so far). I'll report when I know more.
Run R/qtl on http://gn2.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P and
GeneNetwork v2.10-pre1-master-3c46a58f5 http://gn2.genenetwork.org/marker_regression ( 5:57AM UTC Apr 30, 2017)
Traceback (most recent call last):
File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/gn2/gene/wqflask/wqflask/views.py", line 640, in marker_regression_page
gn1_template_vars = marker_regression_gn1.MarkerRegression(result).__dict__
File "/home/gn2/gene/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 565, in __init__
gifmap = self.plotIntMapping(intCanvas, startMb = self.startMb, endMb = self.endMb, showLocusForm= showLocusForm)
File "/home/gn2/gene/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 850, in plotIntMapping
self.drawProbeSetPosition(canvas, plotXScale, offset=newoffset, zoom = zoom)
File "/home/gn2/gene/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 1040, in drawProbeSetPosition
locPixel += (self.ChrLengthDistList[i] + self.GraphInterval)*plotXScale
IndexError: list index out of range
@zsloan ping
Ah sorry, I'll take a look at this tomorrow morning.
On Sun, May 14, 2017 at 4:25 AM, Pjotr Prins notifications@github.com wrote:
Assigned #186 https://github.com/genenetwork/genenetwork2/issues/186 to @zsloan https://github.com/zsloan.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#event-1081230923, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmLh6oz9dCl9VPkukSgpTI2eio8CQks5r5siGgaJpZM4KF3CD .
I've fixed this on my branch and will include it in my next pull request. I'm not entirely sure why this problem didn't always occur, but it might have been related to the "select genofile" option.
Hi Zach, best to add a link to the patch in the issue tracker, so we know where to find it.
Not working on staging: http://gn2-guix.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P
Now it works on http://gn2-staging.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P if you set permutations to 1. @robwwilliams why are we defaulting to 2000 permutations? With R/qtl that takes forever.
Dear Pjotr, 2000 works for HK method, but 100 is reasonable start if we can compute within 30-60 seconds. We can ramp up as code and computers get faster. Max needed by most users is about 1000 unless doing this for final publication and to brag about using large number (10k).
On Sun, Apr 1, 2018 at 2:41 AM Pjotr Prins notifications@github.com wrote:
Now it works on http://gn2-staging.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P if you set permutations to 1. @robwwilliams https://github.com/robwwilliams why are we defaulting to 2000 permutations? With R/qtl that takes forever.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377769249, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_MWjUaWTLtWA_a_bEhcUiwvjoejVks5tkIS3gaJpZM4KF3CD .
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
I think we should default to NO permutations. Most runs are exploratory, right? We should have a checkbox to switch them ON, but have them OFF by default.
I guess this will depend on the delay. There is this weird psychological effect that users want to think you are working hard on their behalf. If we can do 100 permutations in under 1 min most users will like to see a threshold. And is the results are cached on server so that they can accumulate permutations and/or zoom in to one chromosomes some that would be a big win over GN1.
On Sun, Apr 1, 2018 at 3:19 AM Pjotr Ed client side so that the chromosome view is justPrins notifications@github.com wrote:
I think we should default to NO permutations. Most runs are exploratory, right? We should have a checkbox to switch them ON, but have them OFF by default.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377771124, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_A_YmNEqZQlUo4kj61-4BAy2JEcTks5tkI2jgaJpZM4KF3CD .
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
@robwwilliams best to try yourself whether you like the current default for R/qtl.
Will do!
On Mon, Apr 2, 2018 at 4:01 AM Pjotr Prins notifications@github.com wrote:
@robwwilliams https://github.com/robwwilliams best to try yourself whether you like the current default for R/qtl.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377889425, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_PJa_zMf5_onbW-leEn5vpOjZCZ1ks5tkekDgaJpZM4KF3CD .
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
I defer to our real experts.
Right now are default is "em" and "normal".
What we definitely need (Zach can handle) is the link to R/qtl documentation so that users can efficiently select the appropriate algorithm.
The big question I have is whether we should run any permutation analysis by default. HK should tolerate.
On Mon, Apr 2, 2018 at 4:01 AM, Pjotr Prins notifications@github.com wrote:
@robwwilliams https://github.com/robwwilliams best to try yourself whether you like the current default for R/qtl.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377889425, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_PJa_zMf5_onbW-leEn5vpOjZCZ1ks5tkekDgaJpZM4KF3CD .
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
@kbroman do you have an opinion what defaults we should use for R/qtl? E.g. hit the 'Mapping' bar on
http://gn2.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P
@pjotrp In general I'd say method="em", model="normal"
, but for BXD data (that is, dense marker genotypes), I'd go with method="hk", model="normal"
.
For GN2, I think we can set the default to "hk" and "normal". The EM is needed when there is selective genotyping, or when marker density is low. For most of the omic data we have in GN, that does not apply. The Haley-Knott method is also well-suited for parallel computing of permutations.
Saunak
On Tue, Apr 03, 2018 at 04:58:33PM -0500, Rob Williams wrote:
I defer to our real experts.
Right now are default is "em" and "normal".
What we definitely need (Zach can handle) is the link to R/qtl documentation so that users can efficiently select the appropriate algorithm.
The big question I have is whether we should run any permutation analysis by default. HK should tolerate.
On Mon, Apr 2, 2018 at 4:01 AM, Pjotr Prins notifications@github.com wrote:
@robwwilliams best to try yourself whether you like the current default for R/qtl. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.*
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
-- Śaunak Sen ... http://www.senresearch.org Prof and Chief of Biostatistics, Dept of Prev Med, UTHSC, Memphis, TN Appointments: https://saunaksen.youcanbook.me
@zsloan: for now we should default on hk and zero permutations. Once we get speed decent using parallel hk we can add permutations again. I'll add a new issue for that.
We OK with this now?
No and yes. Just checked BXD Phenotype Trait 12660 using default setting. No result after 3 minutes, but finally the output below after about 4 minutes.
The issue is the number of permutations (default at 2000). Simple fix would be to reduce to 200 permutations fo this code.
[image: image.png]
On Wed, Feb 13, 2019 at 6:27 AM Pjotr Prins notifications@github.com wrote:
We OK with this now?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-463181153, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_Du9xqe7LvblW_Y1DEQWgKkWohp0ks5vNAStgaJpZM4KF3CD .
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
I just made this change on my branch, and I'll go ahead and push it later today since it's very simple.
On Wed, Feb 13, 2019 at 7:48 AM robwwilliams notifications@github.com wrote:
No and yes. Just checked BXD Phenotype Trait 12660 using default setting. No result after 3 minutes, but finally the output below after about 4 minutes.
The issue is the number of permutations (default at 2000). Simple fix would be to reduce to 200 permutations fo this code.
[image: image.png]
On Wed, Feb 13, 2019 at 6:27 AM Pjotr Prins notifications@github.com wrote:
We OK with this now?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-463181153 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ALva_Du9xqe7LvblW_Y1DEQWgKkWohp0ks5vNAStgaJpZM4KF3CD
.
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-463205334, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmEo9Vxi-5J-mNGfdAHlbz7sgbBORks5vNBergaJpZM4KF3CD .
For some reason permutations weren't set at 200 when I checked just now, but I changed them so this issue should be okay now.
When running R/qtl on BXD with URL/show_trait?trait_id=1435395_s_at&dataset=HC_M2_0606_P using the small database and the latest geno files I get the error
File "/export2/izip/git/opensource/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression.py", line 169, in init results = self.run_rqtl_geno() File "/export2/izip/git/opensource/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression.py", line 418, in run_rqtl_geno cross_object = self.add_phenotype(cross_object, self.sanitize_rqtl_phenotype()) # Add the phenotype File "/export2/izip/git/opensource/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression.py", line 458, in add_phenotype ro.r('the_cross$pheno <- cbind(pull.pheno(the_cross), the_pheno = '+ pheno_as_string +')') (...) RRuntimeError: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 198, 92
This is after the fix in https://github.com/genenetwork/genenetwork2/commit/ba3303636be68cbbca15ebcdfae2176cbcaa923e
Interestingly interval mapping and pylmm still run. Zach, can you check why this is?