Closed marcomass closed 7 years ago
In GMQL, update accordingly the default schemas (and their xsd) for the supported standard format (narrow/broadpeak, bed, bedgraph, gtf, vcf), which are used when the user add to the repository his/her own dataset. Update also the schema of the public datasets in the repository (as well as the reference schema used in Jorge software to update the repository.
@OlgaGorlova Can you provide an example of how to specify the coordinateSystem attribute and value in the dataset schema?
@marcomass You do not have to specify it - GTF and VCF formats by default are read as 1-based, others as 0-based
@OlgaGorlova Ok, good. Yet, what about data in general tab-delimited format? Being user-defined data, they could have either of the two coordinate systems; so it is needed the possibility to define in their schema which is the coordinate system they use. Has this been enabled? If/when yes, please close this issue I reopened.
@acanakoglu @OlgaGorlova Can you provide an example of how to specify the coordinateSystem attribute and value in the dataset schema?
@OlgaGorlova @acanakoglu I reopened this issue, since it is needed that the user can specify in the xml schema of a dataset regarding not standard data format (i.e., tab-delimited) which is the coordinate system used by the dataset data. Please implement it.
@OlgaGorlova, There was a bug in the creation of schema and I corrected and committed. However, there is another problem to correct.
In the output, you are not passing coordinate system parameter to CLI? And also, are you setting the default type in the output for TAB and GTF as you mentioned above?
You do not have to specify it - GTF and VCF formats by default are read as 1-based, others as 0-based
If you need more explanation, please let me know.
@acanakoglu, Thank you!
In the output, you are not passing coordinate system parameter to CLI?
Yes, I committed changes to https://github.com/DEIB-GECO/GMQL/tree/Coordinate_System . Could you please try it?
And also, are you setting the default type in the output for TAB and GTF as you mentioned above?
Yes, if you do not specify the coordinate system, then it will use default type.
I tried and with my test everything is ok. You can merge into the main branch.
In the tag gmqlSchema of a dataset schema xml file add the attribute coordinateSystem (which can add values 0-based oe 1-based). According to such attribute, manage properly dataset input and output in GMQL, by correctly translating (if need) to/from the 0-based coordinate system used within GMQL,. If no coordinateSystem attribute is specified in the dataset schema, as default use the value 0-based.
Standard format have their predefined coordinate system, for:
https://www.biostars.org/p/6373/ https://genome.ucsc.edu/FAQ/FAQformat.html#format13