bioinfo-ut / PhenotypeSeeker

Identify phenotype-specific k-mers and predict phenotype using sequenced bacterial strains
GNU General Public License v3.0
18 stars 10 forks source link

Error encountered during model generation: 'struct.error' #22

Open Tonny-zhou opened 1 year ago

Tonny-zhou commented 1 year ago

Dear PhenotypeSeeker Development Team,

I hope this message finds you well. I am writing to report an issue I encountered while using PhenotypeSeeker software. During the model generation phase (stdout information: "Generating the random forest model for phenotype"), I received the following error: 'struct.error: 'i' format requires -2147483648 <= number <= 2147483647'.

I am unsure of the exact meaning and cause of this error. Could you please provide some insights into the nature of this error and any potential solutions or troubleshooting steps that I can take to address it?

Additionally, I would like to mention that my training dataset includes 3000 genomes, and I have kept the k-mer length at the default value. My server has 250 threads and 1TB of memory available for running the analysis.

I appreciate the efforts of your team in developing PhenotypeSeeker and its valuable functionalities. I am eager to continue utilizing this software for my research purposes, and resolving this error would greatly aid in my analyses.

Thank you for your attention to this matter. I look forward to your response and guidance.

Best regards, Z

erkiaun commented 1 year ago

Dear Tonny-Zhou,

It seems that it has something to do with exceeding the memory limits for Python's pickle module that I have used for serializing (struct.error: 'I' format requires 0 <= number <= 4294967295 · Issue #256 · uqfoundation/dill (github.com) https://github.com/uqfoundation/dill/issues/256).

I am sorry, but I doubt, that in near future I have time resources to debug and fix this. Also, I would probably need some dataset that throws this error to pinpoint the broken spot in code. I have used about 6000 bacterial genomes recently and haven't got this error.

Maybe you could send me the full long error message that it throws onto your console? Could you test, if it runs without error in your environment using fewer genomes, maybe about 300 or so?

Best regards, Erki

On Fri, 7 Jul 2023 at 10:21, Tonny-zhou @.***> wrote:

Dear PhenotypeSeeker Development Team,

I hope this message finds you well. I am writing to report an issue I encountered while using PhenotypeSeeker software. During the model generation phase (stdout information: "Generating the random forest model for phenotype"), I received the following error: 'struct.error: 'i' format requires -2147483648 <= number <= 2147483647'.

I am unsure of the exact meaning and cause of this error. Could you please provide some insights into the nature of this error and any potential solutions or troubleshooting steps that I can take to address it?

Additionally, I would like to mention that my training dataset includes 3000 genomes, and I have kept the k-mer length at the default value. My server has 250 threads and 1TB of memory available for running the analysis.

I appreciate the efforts of your team in developing PhenotypeSeeker and its valuable functionalities. I am eager to continue utilizing this software for my research purposes, and resolving this error would greatly aid in my analyses.

Thank you for your attention to this matter. I look forward to your response and guidance.

Best regards, Z

— Reply to this email directly, view it on GitHub https://github.com/bioinfo-ut/PhenotypeSeeker/issues/22, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEPVJPDIUKIPUOWPO3QZZTXO62GFANCNFSM6AAAAAA2BNRII4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Tonny-zhou commented 1 year ago

Dear Erki, Thank you for your quick reply. The full long error message was put in the attachment file. RF.log

Tonny-zhou commented 1 year ago

Yes, you are right. If i run with fewer genomes, i could get the training model without error.