genenetwork / genenetwork2

GeneNetwork (2nd generation)
http://gn2.genenetwork.org/
GNU Affero General Public License v3.0
34 stars 24 forks source link

R/qtl no longer running on master branch #186

Closed pjotrp closed 5 years ago

pjotrp commented 8 years ago

When running R/qtl on BXD with URL/show_trait?trait_id=1435395_s_at&dataset=HC_M2_0606_P using the small database and the latest geno files I get the error

File "/export2/izip/git/opensource/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression.py", line 169, in init results = self.run_rqtl_geno() File "/export2/izip/git/opensource/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression.py", line 418, in run_rqtl_geno cross_object = self.add_phenotype(cross_object, self.sanitize_rqtl_phenotype()) # Add the phenotype File "/export2/izip/git/opensource/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression.py", line 458, in add_phenotype ro.r('the_cross$pheno <- cbind(pull.pheno(the_cross), the_pheno = '+ pheno_as_string +')') (...) RRuntimeError: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 198, 92

This is after the fix in https://github.com/genenetwork/genenetwork2/commit/ba3303636be68cbbca15ebcdfae2176cbcaa923e

Interestingly interval mapping and pylmm still run. Zach, can you check why this is?

pjotrp commented 8 years ago

note that you can increase debugging output with

env LOG_LEVEL=DEBUG ./bin/genenetwork2 ~/gn2_settings.py
zsloan commented 8 years ago

I'm guessing this only occurs with the small database; with the full one R/qtl runs fine with the old genotypes and throws the same memory error as the other two mapping methods with the new genotypes (not the error you seem to be getting).

I've gotten errors similar to that in the past and it usually is related to the number of samples/strains not matching what it sees in the genotype file.

On Sun, Sep 25, 2016 at 5:25 AM, Pjotr Prins notifications@github.com wrote:

note that you can increase debugging output with

env LOG_LEVEL=DEBUG ./bin/genenetwork2 ~/gn2_settings.py

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-249414052, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmM0LqouDItavj5IzcQXzX94Mqd5Tks5qtkv_gaJpZM4KF3CD .

pjotrp commented 8 years ago

Yes, pylmm and reaper run fine on this. OK, leave it open, I'll take a look when you are done splitting the files.

pjotrp commented 8 years ago

Created a separate issue for the error when loading large geno files: https://github.com/genenetwork/genenetwork2/issues/190

pjotrp commented 8 years ago

It works again in the browser.

pjotrp commented 8 years ago

And now it does not work. When I changed to the latest genotype files R/qtl broke again with that same message. Could it be that BXD.json is out of date? Why are we using two genotype formats anyway?

pjotrp commented 8 years ago

http://test-gn2.genenetwork.org/show_trait?trait_id=1432048_at&dataset=HC_M2_0606_P

When running R/qtl the log says:

        198  individuals
         3811  markers
         2  phenotypes
 --Cross type: f2 
covnames (purged):  rs6405415 
No covariates
INFO:utility.benchmark:.__exit__:   Total time in MarkerRegression took: 5.115181 seconds
INFO:wqflask.marker_regression.marker_regression_gn1:.__init__: Running qtlreaper
INFO:utility.tools:Found: file /home/pjotr/gn2_data/genotype/BXD.geno

reaper: parsing /home/pjotr/gn2_data/genotype/BXD.geno
reaper: done parsing
ERROR:wqflask.views:.handle_bad_request: 11:47:38 UTC 20161002: list index out of range
ERROR:wqflask.views:.handle_bad_request: 11:47:38 UTC 20161002: u'http://test-gn2.genenetwork.org/marker_regression'
ERROR:wqflask.views:.handle_bad_request: 11:47:38 UTC 20161002: Traceback (most recent call last):
  File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/views.py", line 516, in marker_regression_page
    gn1_template_vars = marker_regression_gn1.MarkerRegression(result).__dict__
  File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 561, in __init__
    gifmap = self.plotIntMapping(intCanvas, startMb = self.startMb, endMb = self.endMb, showLocusForm= showLocusForm)
  File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 836, in plotIntMapping
    self.drawQTL(canvas, drawAreaHeight, gifmap, plotXScale, offset=newoffset, zoom= zoom, startMb=startMb, endMb = endMb)
  File "/home/pjotr/genenetwork/sumo_gn2/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 2031, in drawQTL
    canvas.drawPolygon(LRSCoordXY,edgeColor=thisLRSColor,closed=0, edgeWidth=lrsEdgeWidth, clipX=(xLeftOffset, xLeftOffset + plotWidth))
  File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/piddle-1.0.15gn-py2.7.egg/piddle/piddlePIL.py", line 377, in drawPolygon
    if (closed or (pts[0][0]==pts[-1][0] and pts[0][1]==pts[-1][1])) \
IndexError: list index out of range

You can see /home/pjotr/gn2_data/genotype/BXD.geno is loaded (this is on Penguin2)

  1630168 Sep 23 12:29 /home/pjotr/gn2_data/genotype/BXD.geno
zsloan commented 8 years ago

We had some mapping method (or methods) that didn't automatically read the ".geno" files, so we converted them to JSON because it was easy to convert to and from python.

I think that maybe it was PYLMM and you changed this when you improved it, though? I just did a grep and can't find anywhere that uses them (other than one place in show_trait that isn't necessary if we're not using them), so maybe they can be removed now?

On Sat, Oct 1, 2016 at 2:16 AM, Pjotr Prins notifications@github.com wrote:

And now it does not work. When I changed to the latest genotype files R/qtl broke again with that same message. Could it be that BXD.json is out of date? Why are we using two genotype formats anyway?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-250897608, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmElrjqopZhRkzrxm9BkFBNTJsgLnks5qvgjOgaJpZM4KF3CD .

pjotrp commented 8 years ago

Yeah, we should remove the JSON files - makes no sense to have duplicate data. But the issue here is different. R/qtl is broken at the moment. Not sure why.

pjotrp commented 8 years ago

Fixed - R/qtl scanone is fine on staging.

Created a different issue for JSON files in https://github.com/genenetwork/genenetwork2/issues/202

pjotrp commented 8 years ago

Actually I hit the piddle error again this morning. Looks like this happens rarely and I have not been able to reprdoduce it later. Reopening issue because I think it is a state problem. @zsloan What does above code actually do that we can get an IndexError?

zsloan commented 8 years ago

It appears to be column binding the cross object and the phenotypes (which are passed in as a string). I imagine the error is due to there being a mismatch between the number of samples/strains in the cross object and the number of phenotypes. The R/qtl line (with variable contents included) would end up looking like this:

the_cross$pheno <- cbind(pull.pheno(the_cross), the_pheno = c(14.129,14.166,14.110,14.098,14.232,14.000,14.270,14.188,14.204,NA,13.923,13.939,NA,13.836,13 .957,14.073,14.011,14.060,14.326,NA,NA,14.154,14.184,13.897,NA,13.984,14.408,14.056,14.058,NA,NA,NA,14.096,14.059,13.964,NA,14.064,14.007,14.262,14.106,13 .900,13.939,NA,14.087,13.707,NA,NA,NA,14.326,NA,NA,14.224,14.259,14.192,13.954,14.136,13.956,14.180,14.058,14.015,14.028,14.153,14.326,13.922,NA,NA,14.236 ,14.053,NA,14.155,13.846,14.060,14.037,NA,14.065,NA,14.222,14.108,14.043,14.410,13.986,NA,13.936,13.946,NA,14.125,13.994,NA,13.866,14.336,NA,NA,NA,NA,NA,N A,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA ,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

Have you only sometimes noticed this error for the same trait, or do you mean that the error has only occurred with certain traits (but consistently occurs with those traits)? The latter might make sense, because the inputs are the cross object (which should be the same for all traits from that group) and the phenotypes (which will be different for each trait).

Unfortunately I probably can't help with fixing the code itself, since Danny wrote all the R code.

pjotrp commented 8 years ago

Yeah, this error comes and goes on one trait. That is why I can not reproduce it (so far). I'll report when I know more.

pjotrp commented 7 years ago

Run R/qtl on http://gn2.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P and

    GeneNetwork v2.10-pre1-master-3c46a58f5  http://gn2.genenetwork.org/marker_regression ( 5:57AM UTC Apr 30, 2017)
       Traceback (most recent call last):
         File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1475, in full_dispatch_request
           rv = self.dispatch_request()
         File "/usr/local/guix-profiles/gn2-staging/lib/python2.7/site-packages/Flask-0.10.1-py2.7.egg/flask/app.py", line 1461, in dispatch_request
           return self.view_functions[rule.endpoint](**req.view_args)
         File "/home/gn2/gene/wqflask/wqflask/views.py", line 640, in marker_regression_page
           gn1_template_vars = marker_regression_gn1.MarkerRegression(result).__dict__
         File "/home/gn2/gene/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 565, in __init__
           gifmap = self.plotIntMapping(intCanvas, startMb = self.startMb, endMb = self.endMb, showLocusForm= showLocusForm)
         File "/home/gn2/gene/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 850, in plotIntMapping
           self.drawProbeSetPosition(canvas, plotXScale, offset=newoffset, zoom = zoom)
         File "/home/gn2/gene/wqflask/wqflask/marker_regression/marker_regression_gn1.py", line 1040, in drawProbeSetPosition
           locPixel += (self.ChrLengthDistList[i] + self.GraphInterval)*plotXScale
       IndexError: list index out of range
pjotrp commented 7 years ago

@zsloan ping

zsloan commented 7 years ago

Ah sorry, I'll take a look at this tomorrow morning.

On Sun, May 14, 2017 at 4:25 AM, Pjotr Prins notifications@github.com wrote:

Assigned #186 https://github.com/genenetwork/genenetwork2/issues/186 to @zsloan https://github.com/zsloan.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#event-1081230923, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmLh6oz9dCl9VPkukSgpTI2eio8CQks5r5siGgaJpZM4KF3CD .

zsloan commented 7 years ago

I've fixed this on my branch and will include it in my next pull request. I'm not entirely sure why this problem didn't always occur, but it might have been related to the "select genofile" option.

pjotrp commented 7 years ago

Hi Zach, best to add a link to the patch in the issue tracker, so we know where to find it.

pjotrp commented 7 years ago

Not working on staging: http://gn2-guix.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P

pjotrp commented 6 years ago

Now it works on http://gn2-staging.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P if you set permutations to 1. @robwwilliams why are we defaulting to 2000 permutations? With R/qtl that takes forever.

robwwilliams commented 6 years ago

Dear Pjotr, 2000 works for HK method, but 100 is reasonable start if we can compute within 30-60 seconds. We can ramp up as code and computers get faster. Max needed by most users is about 1000 unless doing this for final publication and to brag about using large number (10k).

On Sun, Apr 1, 2018 at 2:41 AM Pjotr Prins notifications@github.com wrote:

Now it works on http://gn2-staging.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P if you set permutations to 1. @robwwilliams https://github.com/robwwilliams why are we defaulting to 2000 permutations? With R/qtl that takes forever.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377769249, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_MWjUaWTLtWA_a_bEhcUiwvjoejVks5tkIS3gaJpZM4KF3CD .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

pjotrp commented 6 years ago

I think we should default to NO permutations. Most runs are exploratory, right? We should have a checkbox to switch them ON, but have them OFF by default.

robwwilliams commented 6 years ago

I guess this will depend on the delay. There is this weird psychological effect that users want to think you are working hard on their behalf. If we can do 100 permutations in under 1 min most users will like to see a threshold. And is the results are cached on server so that they can accumulate permutations and/or zoom in to one chromosomes some that would be a big win over GN1.

On Sun, Apr 1, 2018 at 3:19 AM Pjotr Ed client side so that the chromosome view is justPrins notifications@github.com wrote:

I think we should default to NO permutations. Most runs are exploratory, right? We should have a checkbox to switch them ON, but have them OFF by default.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377771124, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_A_YmNEqZQlUo4kj61-4BAy2JEcTks5tkI2jgaJpZM4KF3CD .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

pjotrp commented 6 years ago

@robwwilliams best to try yourself whether you like the current default for R/qtl.

robwwilliams commented 6 years ago

Will do!

On Mon, Apr 2, 2018 at 4:01 AM Pjotr Prins notifications@github.com wrote:

@robwwilliams https://github.com/robwwilliams best to try yourself whether you like the current default for R/qtl.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377889425, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_PJa_zMf5_onbW-leEn5vpOjZCZ1ks5tkekDgaJpZM4KF3CD .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

robwwilliams commented 6 years ago

I defer to our real experts.

Right now are default is "em" and "normal".

What we definitely need (Zach can handle) is the link to R/qtl documentation so that users can efficiently select the appropriate algorithm.

The big question I have is whether we should run any permutation analysis by default. HK should tolerate.

On Mon, Apr 2, 2018 at 4:01 AM, Pjotr Prins notifications@github.com wrote:

@robwwilliams https://github.com/robwwilliams best to try yourself whether you like the current default for R/qtl.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-377889425, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_PJa_zMf5_onbW-leEn5vpOjZCZ1ks5tkekDgaJpZM4KF3CD .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

pjotrp commented 6 years ago

@kbroman do you have an opinion what defaults we should use for R/qtl? E.g. hit the 'Mapping' bar on

http://gn2.genenetwork.org/show_trait?trait_id=1436869_at&dataset=HC_M2_0606_P

kbroman commented 6 years ago

@pjotrp In general I'd say method="em", model="normal", but for BXD data (that is, dense marker genotypes), I'd go with method="hk", model="normal".

robwwilliams commented 6 years ago

For GN2, I think we can set the default to "hk" and "normal". The EM is needed when there is selective genotyping, or when marker density is low. For most of the omic data we have in GN, that does not apply. The Haley-Knott method is also well-suited for parallel computing of permutations.

Saunak

On Tue, Apr 03, 2018 at 04:58:33PM -0500, Rob Williams wrote:

I defer to our real experts.

Right now are default is "em" and "normal".

What we definitely need (Zach can handle) is the link to R/qtl documentation so that users can efficiently select the appropriate algorithm.

The big question I have is whether we should run any permutation analysis by default. HK should tolerate.

On Mon, Apr 2, 2018 at 4:01 AM, Pjotr Prins notifications@github.com wrote:

@robwwilliams best to try yourself whether you like the current default for
R/qtl.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.*

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

-- Śaunak Sen ... http://www.senresearch.org Prof and Chief of Biostatistics, Dept of Prev Med, UTHSC, Memphis, TN Appointments: https://saunaksen.youcanbook.me

pjotrp commented 6 years ago

@zsloan: for now we should default on hk and zero permutations. Once we get speed decent using parallel hk we can add permutations again. I'll add a new issue for that.

pjotrp commented 5 years ago

We OK with this now?

robwwilliams commented 5 years ago

No and yes. Just checked BXD Phenotype Trait 12660 using default setting. No result after 3 minutes, but finally the output below after about 4 minutes.

The issue is the number of permutations (default at 2000). Simple fix would be to reduce to 200 permutations fo this code.

[image: image.png]

On Wed, Feb 13, 2019 at 6:27 AM Pjotr Prins notifications@github.com wrote:

We OK with this now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-463181153, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_Du9xqe7LvblW_Y1DEQWgKkWohp0ks5vNAStgaJpZM4KF3CD .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

zsloan commented 5 years ago

I just made this change on my branch, and I'll go ahead and push it later today since it's very simple.

On Wed, Feb 13, 2019 at 7:48 AM robwwilliams notifications@github.com wrote:

No and yes. Just checked BXD Phenotype Trait 12660 using default setting. No result after 3 minutes, but finally the output below after about 4 minutes.

The issue is the number of permutations (default at 2000). Simple fix would be to reduce to 200 permutations fo this code.

[image: image.png]

On Wed, Feb 13, 2019 at 6:27 AM Pjotr Prins notifications@github.com wrote:

We OK with this now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-463181153 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ALva_Du9xqe7LvblW_Y1DEQWgKkWohp0ks5vNAStgaJpZM4KF3CD

.

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/186#issuecomment-463205334, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmEo9Vxi-5J-mNGfdAHlbz7sgbBORks5vNBergaJpZM4KF3CD .

zsloan commented 5 years ago

For some reason permutations weren't set at 200 when I checked just now, but I changed them so this issue should be okay now.