bigomics / playbase

Core back-end functionality and logic for OmicsPlayground
Other
4 stars 0 forks source link

Fix issue with incorrect readout of large numbers by read.as_matrix [CRITICAL] #100

Closed mauromiguelm closed 7 months ago

mauromiguelm commented 7 months ago

This PR fixes https://github.com/bigomics/omicsplayground/issues/896 by changing the encoding of the data.table to avoid conflicts with large numbers import, plus renaming the pgx read functions to match the file structure, and adding tests for normal and extreme values. The playbase documentation has been updated to reflect changes.

files used in tests: test2.csv large_integers.csv test1.csv

Before update:

> read.as_matrix("./tests/data/large_integers.csv")
        sample1       sample2       sample3       sample4
gene1 395000000 4.187700e-313 1.180817e-314 2.312227e-314
gene2 895050000 2.352098e-314 6.373447e-315 2.312227e-314

after update:

 read.as_matrix("./tests/data/large_integers.csv")
        sample1    sample2  sample3  sample4
gene1 395000000 8.4760e+10 2.39e+09 4.68e+09
gene2 895050000 4.7607e+09 1.29e+09 4.68e+09

test before update

Error: as.numeric(read.as_matrix("./tests/data/large_integers.csv")) (`actual`) not equal to c(...) (`expected`).

    actual    | expected       
[1] 395000000 | 395000000   [1]
[2] 895050000 - 84760000000 [2]
[3] 0         - 2390000000  [3]
[4] 0         - 4680000000  [4]
[5] 0         - 895050000   [5]
[6] 0         - 4760700000  [6]
[7] 0         - 1290000000  [7]
[8] 0         - 4680000000  [8]

test after update

> expect_equal(
+     as.numeric(read.as_matrix("./tests/data/large_integers.csv")), 
+     c(395000000, 895050000, 84760000000,  4760700000,  2390000000, 1290000000, 4680000000, 4680000000)
+     )

# no issues