new "Data Table" widget correcting infinites

juliocoll commented 3 years ago

[ ] What's your use case?
I get a "read warning=array contains infinite" when I submit some of my training cvs or excell files to the "File" and/or "Data Table" widgets before model building. However, there is no infinity values, nor NaN in those files !

YES. The indication of having infinites in my input training data is only detected by the widgets but not fixed. Some of the widget models do not even work under those circunstances. Despite my efforts to eliminate infinite or NaN, I have no succes. It is unclear how the widgets recognize such virtual (?) infinites and which columns and/or rows contain such a data. It is frustrating. Furthermore, other of my highly-related files do not have such indications and I could not see why !!. I was unable to use the models for predicting test files.

I am doing some modeling with fingerprinted docking data consisting in 1865 rows of ligands x 100 general predictor columns or x 1400 PADLE predictor columns.
The models with 100 general predictors only predict with about 25 % of accuracy the training set. The models with 1400 PADLE predictors, improve to 60% their accuracies.......... despite having infinites (???).
However, using any of the models derived from 100 or 1400 predictors, to test files without any binding data, infinite values seem to emerge from nowhere!!!! and predicted values do not appear at all in the widgets.
I am trying to reduce the PADLE predictors by executing PCA, but thats very slow with 1400 predictors, and I am not sure if that will solve the problem. For reading numerous web articles. the infinites problem seem to be more profound than the high number of columns since my own models with 100 columns did not worked either.
[ ] What's your proposed solution?
I would like a new "Data Table" Widget not only saying that "Array contains infinites" but CORRECTING THEM or DELETING THEM or at least: telling where they are, to manually fix them out when required.

Need a new "Data Table" widget correcting input infinites - [ ] **Are there any alternative solutions?** - I do not know

ajdapretnar commented 3 years ago

@juliocoll It would be best, if you attached a sample of your data on which we can reproduce the issue (if possible) and a screenshot of your workflow.

juliocoll commented 3 years ago

dear Adja, Thank you for your attention!

I send you the requested information and more by DROPBOX since when I downsized the large data, I was unable to reproduce the warning in an smaller sample!.

Dropbox has changed its functions, please let me know whether you receive it !

Thank you again for your attention

sincerely

julio

El lun, 4 ene 2021 a las 9:03, Ajda (notifications@github.com) escribió:

@juliocoll https://github.com/juliocoll It would be best, if you attached a sample of your data on which we can reproduce the issue (if possible) and a screenshot of your workflow.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biolab/orange3/issues/5156#issuecomment-753822235, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASKWVML4QPHD5DM7PMSYX2DSYFY6JANCNFSM4VRYGSKA .

-- Dr. Julio M. Coll Dr.Biología. Universidad Complutense, Madrid, SPAIN pHD in Biology.Massachusetts Institute of Technology, Boston, USA Profesor de Investigación. Emérito.

Dpt.Biotecnologia, SGIT, INIA telf: 680154451 juliocollm@gmail.com

La reproduccion de los genomas: http://www.jcoll.org http://www.imtra-vac.unileon.es/ Publicaciones científicas: http://www.jcoll.org/coll-pdfs/

juliocoll commented 3 years ago

dear Adja,

this is a link to the dropbox, in case you need it: https://www.dropbox.com/sh/goctkpis63l88ir/AABiKk4YDzcnb7ba4XuliW9ga?dl=0

El lun, 4 ene 2021 a las 9:03, Ajda (notifications@github.com) escribió:

@juliocoll https://github.com/juliocoll It would be best, if you attached a sample of your data on which we can reproduce the issue (if possible) and a screenshot of your workflow.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biolab/orange3/issues/5156#issuecomment-753822235, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASKWVML4QPHD5DM7PMSYX2DSYFY6JANCNFSM4VRYGSKA .

-- Dr. Julio M. Coll Dr.Biología. Universidad Complutense, Madrid, SPAIN pHD in Biology.Massachusetts Institute of Technology, Boston, USA Profesor de Investigación. Emérito.

Dpt.Biotecnologia, SGIT, INIA telf: 680154451 juliocollm@gmail.com

La reproduccion de los genomas: http://www.jcoll.org http://www.imtra-vac.unileon.es/ Publicaciones científicas: http://www.jcoll.org/coll-pdfs/

ajdapretnar commented 3 years ago

Your data definitely contains infinites: row 589, ID 2122, colums SssS and minssS.

Orange already replaces infinites with nans in Orange/data/table.py, L1763.

If there is any other issue with a failing workflow, please provide details and screenshots.

juliocoll commented 3 years ago

Thank you Ajda!

I could not imaging that the infinites values were so few in number and were inside the word "infinity" !!! I was only surprised that the model seem to work independently of detecting their presence.........

I assume for future work that I just have to search for "Infinity" in the whole data set and delete or sustitute those values ..................

Simple solution :-)

Thanks again

sincerely

julio

El vie, 8 ene 2021 a las 16:25, Ajda (notifications@github.com) escribió:

Your data definitely contains infinites: row 589, ID 2122, colums SssS and minssS.

Orange already replaces infinites with nans in Orange/data/table.py, L1763.

If there is any other issue with a failing workflow, please provide details and screenshots.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biolab/orange3/issues/5156#issuecomment-756814689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASKWVMOANPLC5JZQBBTUO63SY4PYLANCNFSM4VRYGSKA .

-- Dr. Julio M. Coll Dr.Biología. Universidad Complutense, Madrid, SPAIN pHD in Biology.Massachusetts Institute of Technology, Boston, USA Profesor de Investigación. Emérito.

Dpt.Biotecnologia, SGIT, INIA telf: 680154451 juliocollm@gmail.com

La reproduccion de los genomas: http://www.jcoll.org http://www.imtra-vac.unileon.es/ Publicaciones científicas: http://www.jcoll.org/coll-pdfs/

biolab / orange3

new "Data Table" widget correcting infinites #5156