glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
79 stars 26 forks source link

"Comparative Assembly Hub Pipeline" scaled down example sought #91

Open malcook opened 6 years ago

malcook commented 6 years ago

After building the MSA using the blanchette00 example, I hoped to practice including conservation in the buidling of an assemblyHub. Too bad for me, this example did not include gene predictions with which to perform the 4D analysis. So, I was hoping to find that the example HUMAN sequence was an actual excerpt from one of hg{17,18,19,38} since that would have allowed me to extract the genes underlying the excerpt into bed12 format and perform --conservation analysis as part of hal2assemblyHub.py. Alas, blat/blast indicates that HUMAN is not in fact a single contiguous excerpt from any of these human reference genome builds. So, I am stymied in completing my exercise with this example.

So, I am seeking a small/moderate sized data set on which to run hal2assemblyHub.py through to completion, allowing me to "prove" my intended approach before scaling the compute environment. (FWIW: I am currently getting my Grid Engine reconfigured to have one large memory node and a 50 or so 32 GB node, as recommended).

Any pointers to such a data set with which to train myself and that has a known result would be much appreciated!