genenetwork / genenetwork2

GeneNetwork (2nd generation)
http://gn2.genenetwork.org/
GNU Affero General Public License v3.0
34 stars 24 forks source link

Error mapping CFW RNAseq data #348

Closed ameliebaud closed 5 years ago

ameliebaud commented 5 years ago

Hi,

I was trying to map genome-wide without LOCO the Epha4 gene in the hippocampus and got the following error. I'm really keen to see the results, please let me know how I can help/whether it's feasible.

Thanks guys!

Amelie

GeneNetwork tux01:gene:2.11-rc2-production-c2f164331 http://gn2.genenetwork.org/run_mapping ( 4:03AM UTC Dec 13, 2018)
[Errno 2] No such file or directory: u'/export/local/home/gn2/production/tmp/gn2/00d795f692ae9c308f4ba36aa8d10cd01d95c265.1.assoc.txt.assoc.txt' (error)
  File "/export/local/home/gn2/production/gene/wqflask/wqflask/marker_regression/gemma_mapping.py", line 237, in parse_loco_output
    with open(this_file) as output_file:
robwwilliams commented 5 years ago

Dear Amelie,

I used an Epha4 probeset and BXD hippocampus

http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P

and got it to work both with and without LOCO.

BUT the plots and table LOD score values are identical. THIS is a concern. Zach, Pjotr: one of these plots is probably incorrect in terms of the computing method.

Here with

[image: Itvl_TixnbRd8.png] Here without LOCO [image: Itvl_d6SU77vs.png]

On Wed, Dec 12, 2018 at 10:06 PM Amelie Baud notifications@github.com wrote:

Hi,

I was trying to map genome-wide without LOCO the Epha4 gene in the hippocampus and got the following error. I'm really keen to see the results, please let me know how I can help/whether it's feasible.

Thanks guys!

Amelie

GeneNetwork tux01:gene:2.11-rc2-production-c2f164331 http://gn2.genenetwork.org/run_mapping ( 4:03AM UTC Dec 13, 2018) [Errno 2] No such file or directory: u'/export/local/home/gn2/production/tmp/gn2/00d795f692ae9c308f4ba36aa8d10cd01d95c265.1.assoc.txt.assoc.txt' (error) File "/export/local/home/gn2/production/gene/wqflask/wqflask/marker_regression/gemma_mapping.py", line 237, in parse_loco_output with open(this_file) as output_file:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4-- .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

ameliebaud commented 5 years ago

Thanks Rob! I was also able to map Epha4 this morning but then I tried another gene (Dlgap1 hippocampus) and got the same error as yesterday for Epha4. Did you have to fix something? I noticed that I have 3 mapping options for Epha4 this morning (only had GEMMA yesterday) but only have Gemma for Dlgap1.

Amelie. Sent from my phone.

On 13 Dec 2018, at 05:11, robwwilliams notifications@github.com wrote:

Dear Amelie,

I used an Epha4 probeset and BXD hippocampus

http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P

and got it to work both with and without LOCO.

BUT the plots and table LOD score values are identical. THIS is a concern. Zach, Pjotr: one of these plots is probably incorrect in terms of the computing method.

Here with

[image: Itvl_TixnbRd8.png] Here without LOCO [image: Itvl_d6SU77vs.png]

On Wed, Dec 12, 2018 at 10:06 PM Amelie Baud notifications@github.com wrote:

Hi,

I was trying to map genome-wide without LOCO the Epha4 gene in the hippocampus and got the following error. I'm really keen to see the results, please let me know how I can help/whether it's feasible.

Thanks guys!

Amelie

GeneNetwork tux01:gene:2.11-rc2-production-c2f164331 http://gn2.genenetwork.org/run_mapping ( 4:03AM UTC Dec 13, 2018) [Errno 2] No such file or directory: u'/export/local/home/gn2/production/tmp/gn2/00d795f692ae9c308f4ba36aa8d10cd01d95c265.1.assoc.txt.assoc.txt' (error) File "/export/local/home/gn2/production/gene/wqflask/wqflask/marker_regression/gemma_mapping.py", line 237, in parse_loco_output with open(this_file) as output_file:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4-- .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ameliebaud commented 5 years ago

Hi again,

Sorry I think there was a misunderstanding. I was trying to use the CFW dataset, not the BXD. I don’t seem to be able to map CFW expression data in GN2, is that right? GN1 won’t let me do it either (with regular mapping, I don’t mind at this point).

Amelie

On 13 Dec 2018, at 06:06, Amelie Baud amelie30@yahoo.fr wrote:

Thanks Rob! I was also able to map Epha4 this morning but then I tried another gene (Dlgap1 hippocampus) and got the same error as yesterday for Epha4. Did you have to fix something? I noticed that I have 3 mapping options for Epha4 this morning (only had GEMMA yesterday) but only have Gemma for Dlgap1.

Amelie. Sent from my phone.

On 13 Dec 2018, at 05:11, robwwilliams <notifications@github.com mailto:notifications@github.com> wrote:

Dear Amelie,

I used an Epha4 probeset and BXD hippocampus

http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P

and got it to work both with and without LOCO.

BUT the plots and table LOD score values are identical. THIS is a concern. Zach, Pjotr: one of these plots is probably incorrect in terms of the computing method.

Here with

[image: Itvl_TixnbRd8.png] Here without LOCO [image: Itvl_d6SU77vs.png]

On Wed, Dec 12, 2018 at 10:06 PM Amelie Baud <notifications@github.com mailto:notifications@github.com> wrote:

Hi,

I was trying to map genome-wide without LOCO the Epha4 gene in the hippocampus and got the following error. I'm really keen to see the results, please let me know how I can help/whether it's feasible.

Thanks guys!

Amelie

GeneNetwork tux01:gene:2.11-rc2-production-c2f164331 http://gn2.genenetwork.org/run_mapping http://gn2.genenetwork.org/run_mapping ( 4:03AM UTC Dec 13, 2018) [Errno 2] No such file or directory: u'/export/local/home/gn2/production/tmp/gn2/00d795f692ae9c308f4ba36aa8d10cd01d95c265.1.assoc.txt.assoc.txt' (error) File "/export/local/home/gn2/production/gene/wqflask/wqflask/marker_regression/gemma_mapping.py", line 237, in parse_loco_output with open(this_file) as output_file:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/genenetwork/genenetwork2/issues/348 https://github.com/genenetwork/genenetwork2/issues/348>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4-- https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4--> .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu mailto:rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com mailto:labwilliams@gmail.com SKYPE: robwwilliams — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-446963090, or mute the thread https://github.com/notifications/unsubscribe-auth/AVm5gCF9a4ACqpqJnfE3MSuctypwj-z_ks5u4lH9gaJpZM4ZQ4--.

robwwilliams commented 5 years ago

Yep, mapping of CFW data fails. This will almost certainly be a problem with the genotype file. Zach will be back soon and should be able to fix.

On Thu, Dec 13, 2018 at 1:39 PM Amelie Baud notifications@github.com wrote:

Hi again,

Sorry I think there was a misunderstanding. I was trying to use the CFW dataset, not the BXD. I don’t seem to be able to map CFW expression data in GN2, is that right? GN1 won’t let me do it either (with regular mapping, I don’t mind at this point).

Amelie

On 13 Dec 2018, at 06:06, Amelie Baud amelie30@yahoo.fr wrote:

Thanks Rob! I was also able to map Epha4 this morning but then I tried another gene (Dlgap1 hippocampus) and got the same error as yesterday for Epha4. Did you have to fix something? I noticed that I have 3 mapping options for Epha4 this morning (only had GEMMA yesterday) but only have Gemma for Dlgap1.

Amelie. Sent from my phone.

On 13 Dec 2018, at 05:11, robwwilliams <notifications@github.com mailto:notifications@github.com> wrote:

Dear Amelie,

I used an Epha4 probeset and BXD hippocampus

http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P < http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P

and got it to work both with and without LOCO.

BUT the plots and table LOD score values are identical. THIS is a concern. Zach, Pjotr: one of these plots is probably incorrect in terms of the computing method.

Here with

[image: Itvl_TixnbRd8.png] Here without LOCO [image: Itvl_d6SU77vs.png]

On Wed, Dec 12, 2018 at 10:06 PM Amelie Baud <notifications@github.com mailto:notifications@github.com> wrote:

Hi,

I was trying to map genome-wide without LOCO the Epha4 gene in the hippocampus and got the following error. I'm really keen to see the results, please let me know how I can help/whether it's feasible.

Thanks guys!

Amelie

GeneNetwork tux01:gene:2.11-rc2-production-c2f164331 http://gn2.genenetwork.org/run_mapping < http://gn2.genenetwork.org/run_mapping> ( 4:03AM UTC Dec 13, 2018) [Errno 2] No such file or directory: u'/export/local/home/gn2/production/tmp/gn2/00d795f692ae9c308f4ba36aa8d10cd01d95c265.1.assoc.txt.assoc.txt' (error) File "/export/local/home/gn2/production/gene/wqflask/wqflask/marker_regression/gemma_mapping.py", line 237, in parse_loco_output with open(this_file) as output_file:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/genenetwork/genenetwork2/issues/348 < https://github.com/genenetwork/genenetwork2/issues/348>>, or mute the thread < https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4-- < https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4--

.

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu mailto:rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com mailto:labwilliams@gmail.com SKYPE: robwwilliams — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-446963090>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gCF9a4ACqpqJnfE3MSuctypwj-z_ks5u4lH9gaJpZM4ZQ4-- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447094360, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_CRdLtqkg2ebeM9hTu126OqOG4Veks5u4qz9gaJpZM4ZQ4-- .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

robwwilliams commented 5 years ago

Ok, we are also aware of some semi-random failures. If the mapping worked once but the same request failed thereafter, then it is this ugly hard-to-diagnosis bug.

On Thu, Dec 13, 2018 at 8:06 AM Amelie Baud notifications@github.com wrote:

Thanks Rob! I was also able to map Epha4 this morning but then I tried another gene (Dlgap1 hippocampus) and got the same error as yesterday for Epha4. Did you have to fix something? I noticed that I have 3 mapping options for Epha4 this morning (only had GEMMA yesterday) but only have Gemma for Dlgap1.

Amelie. Sent from my phone.

On 13 Dec 2018, at 05:11, robwwilliams notifications@github.com wrote:

Dear Amelie,

I used an Epha4 probeset and BXD hippocampus

http://gn2.genenetwork.org/show_trait?trait_id=1421928_at&dataset=HC_M2_0606_P

and got it to work both with and without LOCO.

BUT the plots and table LOD score values are identical. THIS is a concern. Zach, Pjotr: one of these plots is probably incorrect in terms of the computing method.

Here with

[image: Itvl_TixnbRd8.png] Here without LOCO [image: Itvl_d6SU77vs.png]

On Wed, Dec 12, 2018 at 10:06 PM Amelie Baud notifications@github.com wrote:

Hi,

I was trying to map genome-wide without LOCO the Epha4 gene in the hippocampus and got the following error. I'm really keen to see the results, please let me know how I can help/whether it's feasible.

Thanks guys!

Amelie

GeneNetwork tux01:gene:2.11-rc2-production-c2f164331 http://gn2.genenetwork.org/run_mapping ( 4:03AM UTC Dec 13, 2018) [Errno 2] No such file or directory: u'/export/local/home/gn2/production/tmp/gn2/00d795f692ae9c308f4ba36aa8d10cd01d95c265.1.assoc.txt.assoc.txt' (error) File "/export/local/home/gn2/production/gene/wqflask/wqflask/marker_regression/gemma_mapping.py", line 237, in parse_loco_output with open(this_file) as output_file:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348, or mute the thread < https://github.com/notifications/unsubscribe-auth/ALva_Dt8wufMoHD97u549oto4V44y0y_ks5u4dJMgaJpZM4ZQ4--

.

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-446979814, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_AmNYwqlZUCvNX5Ma_eKO1exeL42ks5u4l7WgaJpZM4ZQ4-- .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

pjotrp commented 5 years ago

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan are we sure we generate different output names for the json results?

ameliebaud commented 5 years ago

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan https://github.com/zsloan are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718, or mute the thread https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--.

zsloan commented 5 years ago

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud notifications@github.com wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4-- .

pjotrp commented 5 years ago

We'll start using IPFS next year so files are never overwritten.

On Fri, Dec 14, 2018 at 12:38:22AM -0800, zsloan wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer). I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it. On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud notifications@github.com wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447 183718>, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk8 1sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-44 7205150, or mute the thread

https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrM LYAFOaQfO-CLks5u4yDygaJpZM4ZQ4-- .

— You are receiving this because you commented. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699
  2. https://github.com/notifications/unsubscribe-auth/AAA077-gzKYpKi7iNJvcxtPAokXlqWuPks5u42N-gaJpZM4ZQ4--
ameliebaud commented 5 years ago

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud notifications@github.com wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4-- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699, or mute the thread https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4--.

zsloan commented 5 years ago

I think that my initial understanding was actually wrong, and the new CFW genotypes were never moved over to GN2. This seems to be because one of the files is missing. My guess is that when I originally went to move the genotypes over I noticed that the _snps file (a file with a list of markers and their locations) was missing. Normally I would ask Apurva to send this file, but Arthur says she's out of town, so I'll have to generate it myself this time (which is fortunately possible because the marker names are generated from the marker's locations, though I'll have to write a script to do this).

One thing I should probably mention is that the new genotypes are very big (old file was 300MB, new one is 3GB), so mapping is probably not going to be very practical until Pjotr finishes upgrading GEMMA to be faster, at least if you want to use LOCO (non-LOCO should be a more manageable ~20-30 minutes).

Either way, I'll convert the file myself this afternoon and I'll let you know when it's moved over.

On Mon, Dec 17, 2018 at 3:43 PM Amelie Baud notifications@github.com wrote:

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud notifications@github.com wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448009533, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmORpj94pfbmKg3SGfdjCfUIQg-JDks5u6A_6gaJpZM4ZQ4-- .

ameliebaud commented 5 years ago

Hi Zach,

Thanks for your reply and making this possible. I am happy to map without LOCO. Also I would be happy mapping using the old file (which I assum is the non imputed genotypes) rather than the new one (imputed?). What Apurva (CC’ed) does here is map using non imputed then fine map using imputed.

Amelie

On 17 Dec 2018, at 13:54, zsloan notifications@github.com wrote:

I think that my initial understanding was actually wrong, and the new CFW genotypes were never moved over to GN2. This seems to be because one of the files is missing. My guess is that when I originally went to move the genotypes over I noticed that the _snps file (a file with a list of markers and their locations) was missing. Normally I would ask Apurva to send this file, but Arthur says she's out of town, so I'll have to generate it myself this time (which is fortunately possible because the marker names are generated from the marker's locations, though I'll have to write a script to do this).

One thing I should probably mention is that the new genotypes are very big (old file was 300MB, new one is 3GB), so mapping is probably not going to be very practical until Pjotr finishes upgrading GEMMA to be faster, at least if you want to use LOCO (non-LOCO should be a more manageable ~20-30 minutes).

Either way, I'll convert the file myself this afternoon and I'll let you know when it's moved over.

On Mon, Dec 17, 2018 at 3:43 PM Amelie Baud notifications@github.com wrote:

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud notifications@github.com wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448009533, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmORpj94pfbmKg3SGfdjCfUIQg-JDks5u6A_6gaJpZM4ZQ4-- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448012787, or mute the thread https://github.com/notifications/unsubscribe-auth/AVm5gGhVnUaFdn67cgJxiE-q3CsvkW2Aks5u6BKFgaJpZM4ZQ4--.

zsloan commented 5 years ago

I've been looking at this and it seems there may have been some confusion on our part, though from what I'm seeing here I'm confused as to how CFW mapping ever worked with GEMMA. Looking at past e-mails, it's possible this issue was just never fully resolved. Does anyone remember doing mapping for CFW with GEMMA?

Also, does anyone know if CFW should include samples like 26305 etc? GN1 only shows 95 samples in the 40000s, but the genotype file includes over 1000 samples (including the one I ended up getting directly from Apurva when this was discussed in an e-mail thread back in May). GEMMA itself is just throwing an error saying there's something wrong with the genotype file, and usually I think that's because of a mismatch between number of individuals in the phenotype file and the genotypes files, but the number looks like it should be correct.

On Mon, Dec 17, 2018 at 4:31 PM Amelie Baud notifications@github.com wrote:

Hi Zach,

Thanks for your reply and making this possible. I am happy to map without LOCO. Also I would be happy mapping using the old file (which I assum is the non imputed genotypes) rather than the new one (imputed?). What Apurva (CC’ed) does here is map using non imputed then fine map using imputed.

Amelie

On 17 Dec 2018, at 13:54, zsloan notifications@github.com wrote:

I think that my initial understanding was actually wrong, and the new CFW genotypes were never moved over to GN2. This seems to be because one of the files is missing. My guess is that when I originally went to move the genotypes over I noticed that the _snps file (a file with a list of markers and their locations) was missing. Normally I would ask Apurva to send this file, but Arthur says she's out of town, so I'll have to generate it myself this time (which is fortunately possible because the marker names are generated from the marker's locations, though I'll have to write a script to do this).

One thing I should probably mention is that the new genotypes are very big (old file was 300MB, new one is 3GB), so mapping is probably not going to be very practical until Pjotr finishes upgrading GEMMA to be faster, at least if you want to use LOCO (non-LOCO should be a more manageable ~20-30 minutes).

Either way, I'll convert the file myself this afternoon and I'll let you know when it's moved over.

On Mon, Dec 17, 2018 at 3:43 PM Amelie Baud notifications@github.com wrote:

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud < notifications@github.com> wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448009533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABsEmORpj94pfbmKg3SGfdjCfUIQg-JDks5u6A_6gaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448012787>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gGhVnUaFdn67cgJxiE-q3CsvkW2Aks5u6BKFgaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448023910, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmMqg7hi1UCyIgt9SzYHDRZV2vkPwks5u6BsqgaJpZM4ZQ4-- .

ameliebaud commented 5 years ago

Let’s loom in Apurva she’s in the office and am sure she’ll be able to help.

Amelie

On 17 Dec 2018, at 15:15, zsloan notifications@github.com wrote:

I've been looking at this and it seems there may have been some confusion on our part, though from what I'm seeing here I'm confused as to how CFW mapping ever worked with GEMMA. Looking at past e-mails, it's possible this issue was just never fully resolved. Does anyone remember doing mapping for CFW with GEMMA?

Also, does anyone know if CFW should include samples like 26305 etc? GN1 only shows 95 samples in the 40000s, but the genotype file includes over 1000 samples (including the one I ended up getting directly from Apurva when this was discussed in an e-mail thread back in May). GEMMA itself is just throwing an error saying there's something wrong with the genotype file, and usually I think that's because of a mismatch between number of individuals in the phenotype file and the genotypes files, but the number looks like it should be correct.

On Mon, Dec 17, 2018 at 4:31 PM Amelie Baud notifications@github.com wrote:

Hi Zach,

Thanks for your reply and making this possible. I am happy to map without LOCO. Also I would be happy mapping using the old file (which I assum is the non imputed genotypes) rather than the new one (imputed?). What Apurva (CC’ed) does here is map using non imputed then fine map using imputed.

Amelie

On 17 Dec 2018, at 13:54, zsloan notifications@github.com wrote:

I think that my initial understanding was actually wrong, and the new CFW genotypes were never moved over to GN2. This seems to be because one of the files is missing. My guess is that when I originally went to move the genotypes over I noticed that the _snps file (a file with a list of markers and their locations) was missing. Normally I would ask Apurva to send this file, but Arthur says she's out of town, so I'll have to generate it myself this time (which is fortunately possible because the marker names are generated from the marker's locations, though I'll have to write a script to do this).

One thing I should probably mention is that the new genotypes are very big (old file was 300MB, new one is 3GB), so mapping is probably not going to be very practical until Pjotr finishes upgrading GEMMA to be faster, at least if you want to use LOCO (non-LOCO should be a more manageable ~20-30 minutes).

Either way, I'll convert the file myself this afternoon and I'll let you know when it's moved over.

On Mon, Dec 17, 2018 at 3:43 PM Amelie Baud notifications@github.com wrote:

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud < notifications@github.com> wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448009533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABsEmORpj94pfbmKg3SGfdjCfUIQg-JDks5u6A_6gaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448012787>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gGhVnUaFdn67cgJxiE-q3CsvkW2Aks5u6BKFgaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448023910, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmMqg7hi1UCyIgt9SzYHDRZV2vkPwks5u6BsqgaJpZM4ZQ4-- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448035180, or mute the thread https://github.com/notifications/unsubscribe-auth/AVm5gNciVefbFH49QABkxLo8g7xLDr2oks5u6CWhgaJpZM4ZQ4--.

robwwilliams commented 5 years ago

I think that Zach got this working today. Will confirm tomorrow.

On Mon, Dec 17, 2018 at 3:43 PM Amelie Baud notifications@github.com wrote:

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud notifications@github.com wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4-- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448009533, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_BkcjDNYDsx1EhEzATIn_tU891Hfks5u6A_6gaJpZM4ZQ4-- .

-- Rob

Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

ameliebaud commented 5 years ago

Hi Zach,

Copying all the relevant emails about the CFW data:

Email1:

From: Zachary Sloan [zachary.a.sloan@gmail.com] Sent: Wednesday, April 18, 2018 10:35 AM To: Palmer, Abraham Cc: Chitre, Apurva; Natalia Gonzales; Rob Williams; Arthur Centeno; Centeno, Arturo G; Pjotr Prins; Williams, Robert Subject: Re: AIL data on GN2

The files are converted, but there's an issue I've noticed that is causing both CFW and AIL to have a problem. I'll have to change the code so that it builds the Marker objects from the BIMBAM files instead of these JSON files that were converted from the .geno ones, since our .geno ones for AIL and CFW are basically dummy files. This shouldn't be that difficult, and I'm hoping I can get it working today.

Email2: From: Zachary Sloan [zachary.a.sloan@gmail.com] Sent: Monday, May 21, 2018 11:57 AM To: Chitre, Apurva Cc: Palmer, Abraham; Rob Williams; Pjotr Prins; Shyam Subject: Re: Issue with generating kinship file from BIMBAM

Thanks! I'll try to get it up today, and hopefully GEMMA will work after doing so.

On Mon, May 21, 2018 at 1:42 PM, Chitre, Apurva aschitre@ucsd.edu wrote:

Hi Zach,

The genotype dosage file for the CFW dataset is on Dryad:
The file that you will need to download is geno.txt.gz

Here's the link:
https://datadryad.org/bitstream/handle/10255/dryad.117923/geno.txt.gz?sequence=1

From the README on Dryad:
https://datadryad.org/bitstream/handle/10255/dryad.117925/README.txt?sequence=2

The genotype data are stored in file geno.txt. The table with
space-delimited columns contains genotype data for 1,161 mice at
92,734 SNPs. The first column ("id") is the sample id, and the second
column ("discard") indicates whether the sample should be discarded
because of flowcell samples that were mislabeled, and so we cannot be
sure of the identity of these samples.

The remaining columns give the genotypes at all SNPs. The genotypes
are represented as "dosages"; specifically, the expected number of
times the alternative allele is observed in the genotype. This will
either be an integer (0, 1 or 2), or a real number between 0 and 2
when there is some uncertainty in the estimate of the genotype.

Apurva

Apurva S. Chitre Bioinformatics Researcher, Palmer Lab Department of Psychiatry University of California San Diego Biomedical Research Facility II (BRF2); 3A32 9500 Gilman Drive La Jolla, CA 92093-0667 (201)-519-3445 aschitre@ucsd.edu


From: Amelie Baud [amelie30@yahoo.fr] Sent: Monday, December 17, 2018 3:22 PM To: genenetwork/genenetwork2 Cc: genenetwork/genenetwork2; Author; Chitre, Apurva Subject: Re: [genenetwork/genenetwork2] Error mapping CFW RNAseq data (#348)

Let’s loom in Apurva she’s in the office and am sure she’ll be able to help.

Amelie

On 17 Dec 2018, at 15:15, zsloan notifications@github.com<mailto:notifications@github.com> wrote:

I've been looking at this and it seems there may have been some confusion on our part, though from what I'm seeing here I'm confused as to how CFW mapping ever worked with GEMMA. Looking at past e-mails, it's possible this issue was just never fully resolved. Does anyone remember doing mapping for CFW with GEMMA?

Also, does anyone know if CFW should include samples like 26305 etc? GN1 only shows 95 samples in the 40000s, but the genotype file includes over 1000 samples (including the one I ended up getting directly from Apurva when this was discussed in an e-mail thread back in May). GEMMA itself is just throwing an error saying there's something wrong with the genotype file, and usually I think that's because of a mismatch between number of individuals in the phenotype file and the genotypes files, but the number looks like it should be correct.

On Mon, Dec 17, 2018 at 4:31 PM Amelie Baud notifications@github.com<mailto:notifications@github.com> wrote:

Hi Zach,

Thanks for your reply and making this possible. I am happy to map without LOCO. Also I would be happy mapping using the old file (which I assum is the non imputed genotypes) rather than the new one (imputed?). What Apurva (CC’ed) does here is map using non imputed then fine map using imputed.

Amelie

On 17 Dec 2018, at 13:54, zsloan notifications@github.com<mailto:notifications@github.com> wrote:

I think that my initial understanding was actually wrong, and the new CFW genotypes were never moved over to GN2. This seems to be because one of the files is missing. My guess is that when I originally went to move the genotypes over I noticed that the _snps file (a file with a list of markers and their locations) was missing. Normally I would ask Apurva to send this file, but Arthur says she's out of town, so I'll have to generate it myself this time (which is fortunately possible because the marker names are generated from the marker's locations, though I'll have to write a script to do this).

One thing I should probably mention is that the new genotypes are very big (old file was 300MB, new one is 3GB), so mapping is probably not going to be very practical until Pjotr finishes upgrading GEMMA to be faster, at least if you want to use LOCO (non-LOCO should be a more manageable ~20-30 minutes).

Either way, I'll convert the file myself this afternoon and I'll let you know when it's moved over.

On Mon, Dec 17, 2018 at 3:43 PM Amelie Baud notifications@github.com<mailto:notifications@github.com> wrote:

Hi,

I’m writing to follow on the possibility to map CFW data using either GN1 or GN2. Ideally I would like to do that for a paper I plan to resubmit on Wednesday. Do you think that will be possible? If not I’ll get to work to do that outside of GN.

Thanks,

Amelie

On 14 Dec 2018, at 00:38, zsloan notifications@github.com<mailto:notifications@github.com> wrote:

The genofile is the problem. It seems that when I ran the script to update stuff for some reason it overwrote the correct CFW file (no idea why this occurred, since the tag to ignore it was included and it (fortunately) did not do this for other similar genotype files with dummy .geno files, like HSNIH-Palmer).

I'm not sure if I have access to the correct CFW genofile; it's on my work computer and I don't think there's another copy on tux01. If Arthur can move it somewhere on either Penguin or tux01 I can replace it.

On Thu, Dec 13, 2018 at 9:54 PM Amelie Baud < notifications@github.commailto:notifications@github.com> wrote:

No sorry I get the error everytime I try to map CFW data but I was not getting the error when I followed Rob’s link, but then it was using BXD data.

Amelie

On 13 Dec 2018, at 17:41, Pjotr Prins notifications@github.com<mailto:notifications@github.com> wrote:

Maybe this happens when you hit the compute button twice in succession? I can imagine a race condition here where two gemma's are trying to write the same file. Getting the same result may be a naming problem. @zsloan < https://github.com/zsloan> are we sure we generate different output names for the json results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447183718

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gN4e8ljrOH_SYDzk81sxKq_AODVMks5u4wHFgaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447205150

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABsEmHH5nfkSWAAGQNrMLYAFOaQfO-CLks5u4yDygaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-447254699 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AVm5gIeyIVmeGBl1fs00i21S0xSQc4mtks5u42N-gaJpZM4ZQ4--

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448009533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABsEmORpj94pfbmKg3SGfdjCfUIQg-JDks5u6A_6gaJpZM4ZQ4--

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448012787>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AVm5gGhVnUaFdn67cgJxiE-q3CsvkW2Aks5u6BKFgaJpZM4ZQ4-- .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448023910, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsEmMqg7hi1UCyIgt9SzYHDRZV2vkPwks5u6BsqgaJpZM4ZQ4-- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/genenetwork/genenetwork2/issues/348#issuecomment-448035180, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVm5gNciVefbFH49QABkxLo8g7xLDr2oks5u6CWhgaJpZM4ZQ4--.

zsloan commented 5 years ago

This issue ended up being fixed, so closing it.