Open WagnerNils opened 1 year ago
Continuing on this; I tried to use the strand information from the bed output (not using 'outPSL' flag). However, when lifting to other species the bed file output seems to be sometimes wrong and the PSL output correct (when comparing to what is displayed in USCS browser).
Input bed file: | chrom | Start | End | strand |
---|---|---|---|---|
chr10 | 237585 | 237586 | + |
halLiftover = './cactus/cactus-bin-v2.5.2/bin/halLiftover'
halfile = '605-vertebrate-2020.hal' #same for '241-mammalian-2020v2.hal'
human_bed = path_to_above_bed_file
target_species = 'Ateles_geoffroyi'
target_species_bed = path_to_liftover_bed_file
target_species_psl = path_to_liftover_psl_file
f'{halLiftover} {halfile} Homo_sapiens {human_bed} {target_species} {target_species_bed}'
Output (bed):
pd.read_csv(target_species_bed, sep='\t', names=['tName', 'tStart', 'tEnd', 'tStrand'])
tName | tStart | tEnd | tStrand |
---|---|---|---|
AteGeo_scaffold_8064 | 13504 | 13505 | + |
What is the best way to use the liftover functionality in cactus?
f'{halLiftover} --outPSL {halfile} Homo_sapiens {human_bed} {target_species} {target_species_psl}'
read_psl(target_species_psl)[['strand', 'qName', 'qStart', 'qEnd', 'tName', 'tStart', 'tEnd']]
Output (PSL): | strand | qName | qStart | qEnd | tName | tStart | tEnd |
---|---|---|---|---|---|---|---|
+- | chr10 | 237585 | 237586 | AteGeo_scaffold_8064 | 13504 | 13505 |
Double checking with the USCS browser for the target species the minus strand seems to be proper for this genomic region:
Hi, I want to liftover from human to any species in the zoonomia halfile. The liftover produces different results for the strand information depending on whether the '--outPSL' flag is used or not. E.g. when I perform a dummy liftover from human to human from the minus strand, the PSL output produces results on the plus strand. Without the '--outPSL' flag the minus strand is correctly reported. Here is a simple example to reproduce this:
Output (bed):