biomart / BioMart

21 stars 5 forks source link

Human readable message "Sequence unavailable" where sequence is expected #1

Open jttkim opened 10 years ago

jttkim commented 10 years ago

Some code of mine recently ran into trouble while processing sequences I retrieved from biomart using the Bioconductor biomaRt package. The reason for this turned out that a few entries contained the message "Sequence unavailable" rather than a valid sequence (see [1] for my posting reporting this). After email discussions with a biomaRt author and looking into the BioMart code I think the class generating these entries likely is org.biomart.processors.sequence.Sequence, and its done() method in particular [2].

I currently can't give an account of how attempts to retrieve a "coding" sequence for genes that don't have one yield a result containing the "unavailable" message. However, if my tentative analysis is correct, my suggestion would be to not return any result, rather than one containing a message instead of a sequence.

[1] https://stat.ethz.ch/pipermail/bioconductor/2014-June/060269.html [2] https://github.com/biomart/BioMart/blob/master/plugins/sequence/src/org/biomart/processors/sequence/Sequence.java , line 275

arekkasp commented 10 years ago

Hi Jan, I suspect that the biomaRt package is still running against the old perl code base not the java one (Steffen would have to confirm it)

a

On 30 June 2014 19:03, Jan T. Kim notifications@github.com wrote:

Some code of mine recently ran into trouble while processing sequences I retrieved from biomart using the Bioconductor biomaRt package. The reason for this turned out that a few entries contained the message "Sequence unavailable" rather than a valid sequence (see [1] for my posting reporting this). After email discussions with a biomaRt author and looking into the BioMart code I think the class generating these entries likely is org.biomart.processors.sequence.Sequence, and its done() method in particular [2].

I currently can't give an account of how attempts to retrieve a "coding" sequence for genes that don't have one yield a result containing the "unavailable" message. However, if my tentative analysis is correct, my suggestion would be to not return any result, rather than one containing a message instead of a sequence.

[1] https://stat.ethz.ch/pipermail/bioconductor/2014-June/060269.html [2] https://github.com/biomart/BioMart/blob/master/plugins/sequence/src/org/biomart/processors/sequence/Sequence.java , line 275

Reply to this email directly or view it on GitHub https://github.com/biomart/BioMart/issues/1.

jttkim commented 10 years ago

Dear Steffen,

can you comment on Arek's response below?

Best regards & thanks in advance, Jan

On Tue, Jul 01, 2014 at 11:38:23PM -0700, Arek Kasprzyk wrote:

Hi Jan, I suspect that the biomaRt package is still running against the old perl code base not the java one (Steffen would have to confirm it)

a

On 30 June 2014 19:03, Jan T. Kim notifications@github.com wrote:

Some code of mine recently ran into trouble while processing sequences I retrieved from biomart using the Bioconductor biomaRt package. The reason for this turned out that a few entries contained the message "Sequence unavailable" rather than a valid sequence (see [1] for my posting reporting this). After email discussions with a biomaRt author and looking into the BioMart code I think the class generating these entries likely is org.biomart.processors.sequence.Sequence, and its done() method in particular [2].

I currently can't give an account of how attempts to retrieve a "coding" sequence for genes that don't have one yield a result containing the "unavailable" message. However, if my tentative analysis is correct, my suggestion would be to not return any result, rather than one containing a message instead of a sequence.

[1] https://stat.ethz.ch/pipermail/bioconductor/2014-June/060269.html [2] https://github.com/biomart/BioMart/blob/master/plugins/sequence/src/org/biomart/processors/sequence/Sequence.java , line 275

Reply to this email directly or view it on GitHub https://github.com/biomart/BioMart/issues/1.


Reply to this email directly or view it on GitHub: https://github.com/biomart/BioMart/issues/1#issuecomment-47742005

+- Jan T. Kim -------------------------------------------------------+ | email: jttkim@gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | -----=< hierarchical systems are for files, not for humans >=-----

jttkim commented 10 years ago

Hi Jan, Arek,

We're indeed still querying BioMart 0.7, the update that queries 0.9 should be available in the Bioconductor devel repository soon.

Best, Steffen

On Wed, Jul 2, 2014 at 3:01 AM, Jan Kim jttkim@googlemail.com wrote:

Dear Steffen,

can you comment on Arek's response below?

Best regards & thanks in advance, Jan

On Tue, Jul 01, 2014 at 11:38:23PM -0700, Arek Kasprzyk wrote:

Hi Jan, I suspect that the biomaRt package is still running against the old perl code base not the java one (Steffen would have to confirm it)

a

On 30 June 2014 19:03, Jan T. Kim notifications@github.com wrote:

Some code of mine recently ran into trouble while processing sequences I retrieved from biomart using the Bioconductor biomaRt package. The reason for this turned out that a few entries contained the message "Sequence unavailable" rather than a valid sequence (see [1] for my posting reporting this). After email discussions with a biomaRt author and looking into the BioMart code I think the class generating these entries likely is org.biomart.processors.sequence.Sequence, and its done() method in particular [2].

I currently can't give an account of how attempts to retrieve a "coding" sequence for genes that don't have one yield a result containing the "unavailable" message. However, if my tentative analysis is correct, my suggestion would be to not return any result, rather than one containing a message instead of a sequence.

[1] https://stat.ethz.ch/pipermail/bioconductor/2014-June/060269.html [2]

https://github.com/biomart/BioMart/blob/master/plugins/sequence/src/org/biomart/processors/sequence/Sequence.java , line 275

Reply to this email directly or view it on GitHub https://github.com/biomart/BioMart/issues/1.


Reply to this email directly or view it on GitHub: https://github.com/biomart/BioMart/issues/1#issuecomment-47742005

+- Jan T. Kim -------------------------------------------------------+ | email: jttkim@gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | -----=< hierarchical systems are for files, not for humans >=-----

jttkim commented 10 years ago

Dear Steffen, dear Arek,

On Wed, Jul 02, 2014 at 08:58:57AM -0700, Steffen Durinck wrote:

Hi Jan, Arek,

We're indeed still querying BioMart 0.7, the update that queries 0.9 should be available in the Bioconductor devel repository soon.

ok -- thanks for looking into this, so let's wait for the new version to filter through.

Weeding out the "Sequence unavailable" entries is entirely ok for me for now, I just wanted to minimise the amount of replication of such effort around the globe.

Best regards, Jan

Best, Steffen

On Wed, Jul 2, 2014 at 3:01 AM, Jan Kim jttkim@googlemail.com wrote:

Dear Steffen,

can you comment on Arek's response below?

Best regards & thanks in advance, Jan

On Tue, Jul 01, 2014 at 11:38:23PM -0700, Arek Kasprzyk wrote:

Hi Jan, I suspect that the biomaRt package is still running against the old perl code base not the java one (Steffen would have to confirm it)

a

On 30 June 2014 19:03, Jan T. Kim notifications@github.com wrote:

Some code of mine recently ran into trouble while processing sequences I retrieved from biomart using the Bioconductor biomaRt package. The reason for this turned out that a few entries contained the message "Sequence unavailable" rather than a valid sequence (see [1] for my posting reporting this). After email discussions with a biomaRt author and looking into the BioMart code I think the class generating these entries likely is org.biomart.processors.sequence.Sequence, and its done() method in particular [2].

I currently can't give an account of how attempts to retrieve a "coding" sequence for genes that don't have one yield a result containing the "unavailable" message. However, if my tentative analysis is correct, my suggestion would be to not return any result, rather than one containing a message instead of a sequence.

[1] https://stat.ethz.ch/pipermail/bioconductor/2014-June/060269.html [2]

https://github.com/biomart/BioMart/blob/master/plugins/sequence/src/org/biomart/processors/sequence/Sequence.java , line 275

Reply to this email directly or view it on GitHub https://github.com/biomart/BioMart/issues/1.


Reply to this email directly or view it on GitHub: https://github.com/biomart/BioMart/issues/1#issuecomment-47742005

+- Jan T. Kim -------------------------------------------------------+ | email: jttkim@gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | -----=< hierarchical systems are for files, not for humans >=-----

+- Jan T. Kim -------------------------------------------------------+ | email: jttkim@gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | -----=< hierarchical systems are for files, not for humans >=-----