After building the MSA using the blanchette00 example, I hoped to practice including conservation in the buidling of an assemblyHub. Too bad for me, this example did not include gene predictions with which to perform the 4D analysis. So, I was hoping to find that the example HUMAN sequence was an actual excerpt from one of hg{17,18,19,38} since that would have allowed me to extract the genes underlying the excerpt into bed12 format and perform --conservation analysis as part of hal2assemblyHub.py. Alas, blat/blast indicates that HUMAN is not in fact a single contiguous excerpt from any of these human reference genome builds. So, I am stymied in completing my exercise with this example.
So, I am seeking a small/moderate sized data set on which to run hal2assemblyHub.py through to completion, allowing me to "prove" my intended approach before scaling the compute environment. (FWIW: I am currently getting my Grid Engine reconfigured to have one large memory node and a 50 or so 32 GB node, as recommended).
Any pointers to such a data set with which to train myself and that has a known result would be much appreciated!
After building the MSA using the blanchette00 example, I hoped to practice including conservation in the buidling of an assemblyHub. Too bad for me, this example did not include gene predictions with which to perform the 4D analysis. So, I was hoping to find that the example HUMAN sequence was an actual excerpt from one of hg{17,18,19,38} since that would have allowed me to extract the genes underlying the excerpt into bed12 format and perform --conservation analysis as part of hal2assemblyHub.py. Alas, blat/blast indicates that HUMAN is not in fact a single contiguous excerpt from any of these human reference genome builds. So, I am stymied in completing my exercise with this example.
So, I am seeking a small/moderate sized data set on which to run hal2assemblyHub.py through to completion, allowing me to "prove" my intended approach before scaling the compute environment. (FWIW: I am currently getting my Grid Engine reconfigured to have one large memory node and a 50 or so 32 GB node, as recommended).
Any pointers to such a data set with which to train myself and that has a known result would be much appreciated!