Open marcomass opened 7 years ago
@marcomass, As i understand that BigWig is a binary indexed version of Wiggle format. And Wiggle format is compressed, less accurate, version of BedGraph. Why do not we use BedGraph and always convert BigWig and Wig files to BedGraph ?
@akaitoua You are right that BigWig can be converted in BedGraph (so BigWig is not less accurate than BedGraph). Yet, BedGraph takes much more space than BigWig, so nobody use it, and all use BigWig, as in the provided link.
In any case, this issue regards two aspects:
@marcomass, I check it and these are more details. I suggests to support only BigGraph format since it does not change our data model. So when ever we copy data into GMQL we change the format to what we call GMQL_WIG => which is a BEDGraph but in columnar format, which is binary that GMQL can read and small in Size in fomparison to BEDGraph. Then we store GMQL_WIG in our repository.
Why not BIGWIG and WIG for GMQL, is because we are performing different type of queries than the others in the field. We are performing always a full join between the reference and the experiment (set of regions in the reference almost equal size to the experiment sample). In case we will start supporting an interval joins (which is like selecting small portion of the BIGWIG file) then it is better to change GMQL to index which will be faster in this case.
@akaitoua Ok. How can we change the format to GMQL_WIG when copying data into GMQL? What has to be the input format of this transformation? BEDGraph?
Do you think that using BEDGraph (or GMQL_WIG) as an experiment dataset of a MAP using genes as reference regions in the reference dataset (thus, about 25000 for human) would be handle by the current system with reasonable performance?
Enable the use of signal data sample (e.g. BigWig) in some operands, e.g., as second operand of MAP (or in COVER, to be discussed). Possibly/probably a specific "special" versio,n of the defined MAP operator could be better.
Examples of BigWig files (from 0.6 to 1.5 GB) are available at https://www.encodeproject.org/experiments/ENCSR620VIC/