JesperGrud / IMAGE

GNU Affero General Public License v3.0
8 stars 1 forks source link

GLMNET errors while performing ridge regression #1

Closed joseale2310 closed 3 years ago

joseale2310 commented 5 years ago

Dear Jesper,

I am struggling to use the IMAGE method to process data from my lab. I am getting the error below:

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"
Calls: cv.glmnet -> do.call -> cv.elnet -> predict
Execution halted

It seems to be related to the glmnet package, but I am uncertain about where exactly and in which stage does the error occurs. This error is usually happening in the step 5. Performing ridge regression. My files have the same number of columns and information as the example datasets, so I believe that the error might happen because I might have different versions of the packages used when the algorithm was developed. Could you share an R session with the versions of the packages you use?

Best,

Jose

JesperGrud commented 5 years ago

Dear Jose

I’ve used IMAGE using glmnet version 2.0-16.

Good luck, Jesper

Best regards,

Jesper Grud Skat Madsen Postdoc, Mandrup group

Functional Genomics & Metabolism Research Unit Department of Biochemistry and Molecular Biology University of Southern Denmark

Email Phone

jgsm@bmb.sdu.dkmailto:jgsm@bmb.sdu.dk +45 65 50 23 36

Web

http://www.sdu.dk/bmb/functionalgenomics

[Description: SDU_BLACK_RGB_png] [Description: FuncGenMet_line_light_bg_small]


Campusvej 55 · 5230 Odense M · Tlf. 6550 1000 · www.sdu.dkhttp://www.sdu.dk/

From: joseale2310 [mailto:notifications@github.com] Sent: 27. maj 2019 09:47 To: JesperGrud/IMAGE Cc: Subscribed Subject: [JesperGrud/IMAGE] GLMNET errors while performing ridge regression (#1)

Dear Jesper,

I am struggling to use the IMAGE method to process data from my lab. I am getting the error below:

Error in UseMethod("predict") :

no applicable method for 'predict' applied to an object of class "list"

Calls: cv.glmnet -> do.call -> cv.elnet -> predict

Execution halted

It seems to be related to the glmnet package, but I am uncertain about where exactly and in which stage does the error occurs. This error is usually happening in the step 5. Performing ridge regression. My files have the same amount of columns and information as the example datasets, so I believe that the error might happen because I might have different versions of the packages used when the algorithm was developed. Could you share an R session with the versions of the packages you use?

Best,

Jose

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/JesperGrud/IMAGE/issues/1?email_source=notifications&email_token=AA7FA52HDV5DQQ2OCED6PCLPXOGXBA5CNFSM4HPZNOM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GV62MJQ, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA7FA553W4WNQTQW62ZYS7LPXOGXBANCNFSM4HPZNOMQ.

joseale2310 commented 5 years ago

Thank you! I will give it a try. The version I am using is glmnet_2.0-18. While using this version, I have tried to check the process manually in R. I loaded the TestItManually.R and Stage 1 works good, so I am guessing the problem resides in stage 2 or 3.

Best, Jose.

joseale2310 commented 5 years ago

Dear Jesper,

It seems that going back to the glmnet to your version partially works but I get another error. Please see below the outcome of the IMAGE pipeline.

#### Welcome to IMAGE ####

Your region file contains 68709 regions and 4 conditions across 8 files
Your exon file contains 52468 genes and 4 conditions across 12 files
Your fasta file is /home/dkv252/HOMER/data/genomes/mm9/mm9.fa
You are using 10 processors.

Starting the analysis
    Setting up for parallizing motif search
    Preparing input file for motif searching
    Scanning for motifs
    Running analysis in R
        1. Reading data into R
        2. Analyzing gene expression
        3. Converting motif hits to matrix - Takes a while
        4. Calculating motif activity - Full model stage
        5. Performing ridge regression
            Sample 1 completed
            Sample 2 completed
            Sample 3 completed
            Sample 4 completed
            Sample 5 completed
            Sample 6 completed
Error in predmat[which, seq(nlami)] <- preds : 
  replacement has length zero
Calls: cv.glmnet -> do.call -> cv.elnet
Execution halted
            Sample 7 completedError in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
Execution halted
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
Execution halted

I have tried to cut the RNA and Enhacer data to only the first two conditions, since it seemed that the first 6 samples worked fine. However, this gives me another error:

#### Welcome to IMAGE ####

Your region file contains 68709 regions and 2 conditions across 4 files
Your exon file contains 52468 genes and 2 conditions across 6 files
Your fasta file is /home/dkv252/HOMER/data/genomes/mm9/mm9.fa
You are using 10 processors.

Starting the analysis
    Setting up for parallizing motif search
    Preparing input file for motif searching
    Scanning for motifs
    Running analysis in R
        1. Reading data into R
Error in estimateCommonDisp.default(y[tagsinbin, ], group = group, lib.size = lib.size,  : 
  No genes satisfy rowsum filter
Calls: estimateTrendedDisp ... estimateTrendedDisp.default -> estimateCommonDisp -> estimateCommonDisp.default
Execution halted
        2. Analyzing gene expressionError in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
Execution halted
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
Execution halted

Which I guess it is related to the bins that are formed inside the regression.R script. One of those bins must contain no genes with > 5 counts for all the samples. I then filtered the RNA data to have genes that have > 5 counts for all samples and run again IMAGE, without success:

#### Welcome to IMAGE ####

Your region file contains 68709 regions and 2 conditions across 4 files
Your exon file contains 21887 genes and 2 conditions across 6 files
Your fasta file is /home/dkv252/HOMER/data/genomes/mm9/mm9.fa
You are using 10 processors.

Starting the analysis
    Setting up for parallizing motif search
    Preparing input file for motif searching
    Scanning for motifs
    Running analysis in R
        1. Reading data into R
        2. Analyzing gene expression
        3. Converting motif hits to matrix - Takes a while
        4. Calculating motif activity - Full model stage
Error in { : task 10 failed - "cannot allocate vector of size 395.4 Mb"
Calls: cv.glmnet -> %dopar% -> <Anonymous>
Execution halted
        5. Performing ridge regressionError in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
Execution halted
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
Execution halted

And this error I cannot figure it out. The data seems to be in the correct format, but the analysis fails at an earlier point than using all the samples. Do you have any suggestions of what could I try to make it work?

Best regards,

Jose

JesperGrud commented 5 years ago

Dear Jose,

I’ve never seen these errors before. I think your approach at simplifying before going on is a good idea. If you want I can try to run your files if you send them to me if you want. Regardless, could you please show me the full command you’re issuing to IMAGE? Have you tried running the example files through?

The error in the last entry after filtering seems to be related to the amount of memory available. Can you try to either reduce the number of processors you are using or run IMAGE on a system with more resources?

Best regards, Jesper

Best regards,

Jesper Grud Skat Madsen Postdoc, Mandrup group

Functional Genomics & Metabolism Research Unit Department of Biochemistry and Molecular Biology University of Southern Denmark

Email Phone

jgsm@bmb.sdu.dkmailto:jgsm@bmb.sdu.dk +45 65 50 23 36

Web

http://www.sdu.dk/bmb/functionalgenomics

[Description: SDU_BLACK_RGB_png] [Description: FuncGenMet_line_light_bg_small]


Campusvej 55 · 5230 Odense M · Tlf. 6550 1000 · www.sdu.dkhttp://www.sdu.dk/

From: joseale2310 [mailto:notifications@github.com] Sent: 28. maj 2019 15:19 To: JesperGrud/IMAGE Cc: Jesper Grud Skat Madsen; Comment Subject: Re: [JesperGrud/IMAGE] GLMNET errors while performing ridge regression (#1)

Dear Jesper,

It seems that going back to the glmnet to your version partially works but I get another error. Please see below the outcome of the IMAGE pipeline.

Welcome to IMAGE

Your region file contains 68709 regions and 4 conditions across 8 files

Your exon file contains 52468 genes and 4 conditions across 12 files

Your fasta file is /home/dkv252/HOMER/data/genomes/mm9/mm9.fa

You are using 10 processors.

Starting the analysis

    Setting up for parallizing motif search

    Preparing input file for motif searching

    Scanning for motifs

    Running analysis in R

           1. Reading data into R

           2. Analyzing gene expression

           3. Converting motif hits to matrix - Takes a while

           4. Calculating motif activity - Full model stage

           5. Performing ridge regression

                   Sample 1 completed

                   Sample 2 completed

                   Sample 3 completed

                   Sample 4 completed

                   Sample 5 completed

                   Sample 6 completed

Error in predmat[which, seq(nlami)] <- preds :

replacement has length zero

Calls: cv.glmnet -> do.call -> cv.elnet

Execution halted

                   Sample 7 completedError in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Calls: load -> readChar

Execution halted

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Calls: load -> readChar

Execution halted

I have tried to cut the RNA and Enhacer data to only the first two conditions, since it seemed that the first 6 samples worked fine. However, this gives me another error:

Welcome to IMAGE

Your region file contains 68709 regions and 2 conditions across 4 files

Your exon file contains 52468 genes and 2 conditions across 6 files

Your fasta file is /home/dkv252/HOMER/data/genomes/mm9/mm9.fa

You are using 10 processors.

Starting the analysis

    Setting up for parallizing motif search

    Preparing input file for motif searching

    Scanning for motifs

    Running analysis in R

           1. Reading data into R

Error in estimateCommonDisp.default(y[tagsinbin, ], group = group, lib.size = lib.size, :

No genes satisfy rowsum filter

Calls: estimateTrendedDisp ... estimateTrendedDisp.default -> estimateCommonDisp -> estimateCommonDisp.default

Execution halted

           2. Analyzing gene expressionError in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Calls: load -> readChar

Execution halted

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Calls: load -> readChar

Execution halted

Which I guess it is related to the bins that are formed inside the regression.R script. One of those bins must contain no genes with > 5 counts for all the samples. I then filtered the RNA data to have genes that have > 5 counts for all samples and run again IMAGE, without success:

Welcome to IMAGE

Your region file contains 68709 regions and 2 conditions across 4 files

Your exon file contains 21887 genes and 2 conditions across 6 files

Your fasta file is /home/dkv252/HOMER/data/genomes/mm9/mm9.fa

You are using 10 processors.

Starting the analysis

    Setting up for parallizing motif search

    Preparing input file for motif searching

    Scanning for motifs

    Running analysis in R

           1. Reading data into R

           2. Analyzing gene expression

           3. Converting motif hits to matrix - Takes a while

           4. Calculating motif activity - Full model stage

Error in { : task 10 failed - "cannot allocate vector of size 395.4 Mb"

Calls: cv.glmnet -> %dopar% ->

Execution halted

           5. Performing ridge regressionError in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Calls: load -> readChar

Execution halted

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Calls: load -> readChar

Execution halted

And this error I cannot figure it out. The data seems to be in the correct format, but the analysis fails at an earlier point than using all the samples. Do you have any suggestions of what could I try to make it work?

Best regards,

Jose

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/JesperGrud/IMAGE/issues/1?email_source=notifications&email_token=AA7FA55U32JSBZBA4PJMIKDPXUWMRA5CNFSM4HPZNOM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMC57Q#issuecomment-496512766, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA7FA54NTMYN65GIDFBSJNTPXUWMRANCNFSM4HPZNOMQ.

joseale2310 commented 5 years ago

Dear Jesper,

This is the full command I am running:

IMAGE.pl -region data/short_gt30.bed -expression data/rnaseq.txt -fasta /home/dkv252/HOMER/data/genomes/mm9/mm9.fa -RNADesign 1 1 1 2 2 2 -EnhancerDesign 1 1 2 2 -p 5 -n test

I am currently running it in a 12 processors, 32 Gb ram computer. Someone tried before me to run the analysis in a high-performance computer and it seemed to run indefinitely. Maybe I could take your suggestion and try to run the analysis. I have given it a couple more tries without success...