Closed xuebingjie1990 closed 3 years ago
Right now it requires the query parameters. I'm still working on when the query parameters are missing: 1 when start and end are not provided, it should return all regions of the provided chr 2 when no query parameters are provided, it should return all regions
I think it's safe to just require the query parameters -- so, skip your point 2... if they want the whole file they can hit the whole file and get it on their own. This is explicitly for subsets.
But I think the chrom-only approach is probably reasonable and not too hard to implement.
For the output, I think it would be more convenient to put it in tabular form.
This:
chr1 999812 1001072
chr1 999812 1001072
Not this:
[["chr1",999812,1001072],["chr1",999812,1001072]]
Or, maybe make that parameterizable, with the above as be default.
in fact, shouldn't it come out the above way straight out of the software? I think there's no need to load the results into python, just return them directly. it will save us memory if you can just stream those results.
@xuebingjie1990 Can you provide an example of the API format as well?
@xuebingjie1990 Can you provide an example of the API format as well?
But I think the chrom-only approach is probably reasonable and not too hard to implement.
since what we have is the bigBed format, to get the all the regions of a given chrom, I think we need the chrom.sizes file to get the end coordinates. we can upload it to s3, or is there a way to get the content of the chrom.sizes file from refgenie?
another way is, instead of generate the bigBed files, maybe we should generate bigWig instead.
since what we have is the bigBed format, to get the all the regions of a given chrom, I think we need the chrom.sizes file to get the end coordinates. we can upload it to s3, or is there a way to get the content of the chrom.sizes file from refgenie?
bedToBigBed can't just take chrom? If not I'd just say don't bother then. that does surprise me though.
another way is, instead of generate the bigBed files, maybe we should generate bigWig instead.
That's a different data type, so I don't see what you mean. I don't think this makes sense, bigWig files don't store interval data.
@xuebingjie1990 Can you provide an example of the API format as well?
Is there a reason to use query params here? I'd suggest these should be path params.
Is there a reason to use query params here? I'd suggest these should be path params.
I'll change the chr
to path param, and keep start
and end
as query params.
since what we have is the bigBed format, to get the all the regions of a given chrom, I think we need the chrom.sizes file to get the end coordinates. we can upload it to s3, or is there a way to get the content of the chrom.sizes file from refgenie?
bedToBigBed can't just take chrom? If not I'd just say don't bother then. that does surprise me though.
another way is, instead of generate the bigBed files, maybe we should generate bigWig instead.
That's a different data type, so I don't see what you mean. I don't think this makes sense, bigWig files don't store interval data.
I was talking about pyBigWig
. i'll switch to bigBedToBed
I was talking about pyBigWig. i'll switch to bigBedToBed
???
I was talking about pyBigWig. i'll switch to bigBedToBed
???
When query entries from bigBed files using pyBigWig
, the chr, start, and end are all required.
Since you suggest using bigBedToBed
, I'll use bigBedToBed
instead of pyBigWig
. But I don't know how can I just stream the result of bigBedToBed
since it requires an output path for saving the results to.
doesn't -
or something stream to stdout?
it's stdout
. the ucsc tools generally let you use stdout
to pint to stdout. So, its:
bigBedToBed file.bb stdout
since what we have is the bigBed format, to get the all the regions of a given chrom, I think we need the chrom.sizes file to get the end coordinates. we can upload it to s3, or is there a way to get the content of the chrom.sizes file from refgenie?
bedToBigBed can't just take chrom? If not I'd just say don't bother then. that does surprise me though.
I just confirmed you can just provide chrom -- -chrom
and -start
and -end
are all optional.
You can just use -chrom=chr1
and leave off start/end to get everything on 1 chromosome.
Maybe this argues for keeping these as query params as you had, since they are all optional. that's fine with me, I guess
Here's how you could return the result:
I think you will need to use something like asyncio to read the stdout of the subprocess and return the results asynchronously.
https://docs.python.org/3/library/asyncio-subprocess.html
I don't have direct experience doing this
bigBedToBed file.bb stdout
I figured out. It's working now. but the output has \t and \n in it. i'm trying to format that.
"chr1\t778543\t779076\t.\t1000\t.\t43.63674\t-1.0\t2.67082\t523\nchr1\t778543\t779076\t.\t1000\t.\t224.16478999999998\t-1.0\t4.66339\t319\nchr1\t804614\t805250\t.\t872\t.\t83.88604000000001\t-
......
might want to also pipe it through | cut -f1-3
just to be clean.
I think for the formatting thing, might just be a header type issue; return as plain text or something
alright if this is working, I'd say release it to the dev server so we can see it action. this will also fix the redirect issue, and let you test the new track hubs
alright if this is working, I'd say release it to the dev server so we can see it action. this will also fix the redirect issue, and let you test the new track hubs
yes it's working. I tested it again locally. I'll merge it to dev now
Region-based queries: Returns the queried regions with provided ID and optional query parameters (chr, start, end)
Right now it requires the query parameters. I'm still working on when the query parameters are missing:
start
andend
are not provided, it should return all regions of the provided chrcurrent output: