gkudla / hyb

hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data
GNU General Public License v3.0
13 stars 7 forks source link

Question regarding Hyb output #10

Open dstrib opened 6 months ago

dstrib commented 6 months ago

Hi! I am creating a new reference dataset for Hyb for use with my project, and I have a question about hybrids that are present with the original hOH7 database but are missing from my new output.

For the hOH7 the ".blast" output from the hyb pipeline contains the following records (and others):

405_1    ENSG00000055609_ENST00000262189_MLL3_mRNA   100.00  32  0   0   44  75  1174    1205    4.3e-09 60.2
405_1    MIMAT0000244_MirBase_miR-30c_microRNA   100.00  22  0   0   23  44  1   22  0.0016  41.7

Result: 405_1 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA . MIMAT0000244_MirBase_miR-30c_microRNA 23 44 1 22 0.0016 ENSG00000055609_ENST00000262189_MLL3_mRNA 44 75 1174 1205 4.3e-09

For the updated database, the ".blast" output from the hyb pipeline contains the following records (and others):

405_1    ENSG00000055609_ENST00000262189_KMT2C_mRNA  100.00  32  0   0   44  75  1172    1203    1.2e-08 60.2
405_1    MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA    100.00  22  0   0   23  44  1   22  0.0044  41.7

Result: no hybrid 405_1 is included in the output.

To make sure the e-val is not a problem, I used a setting of hval=100.0 when running hyb. Presumably I would expect the second library to also provide a hybrid for the provided sequence given that there are compatible blast results in both cases. Is there some other selection criteria I am missing that would prevent outputting of a record in the second case? If not, why would I not expect a hybrid output here?

Thanks much!

gkudla commented 6 months ago

Hi,

The hybrid calling algorithm comprises a step where fragments of the read are masked when they are covered by a high-quality blast hit, and become unavailable for hybrid calling thereafter. So a possible explanation could be that your new database contains a high-quality non-chimeric match to your read, and because of this your read is not called as a hybrid.

To check if this is the case, please examine all lines matching read 405_1 in the blast file, not just the lines that correspond to your hybrid.

Greg

On Tue, 12 Dec 2023 at 19:02, Daniel Stribling @.***> wrote:

Hi! I am creating a new reference dataset for Hyb for use with my project, and I have a question about hybrids that are present with the original hOH7 database but are missing from my new output.

The hybrid in the *_ua.hyb file present with the old reference is: 405_1 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA . MIMAT0000244_MirBase_miR-30c_microRNA 23 44 1 22 0.0016 ENSG00000055609_ENST00000262189_MLL3_mRNA 44 75 1174 1205 4.3e-09

For the hOH7 the ".blast" output from the hyb pipeline contains the following records (and others):

405_1 ENSG00000055609_ENST00000262189_MLL3_mRNA 100.00 32 0 0 44 75 1174 1205 4.3e-09 60.2 405_1 MIMAT0000244_MirBase_miR-30c_microRNA 100.00 22 0 0 23 44 1 22 0.0016 41.7

Result: hybrid as provided above.

For the updated database, the ".blast" output from the hyb pipeline contains the following records (and others):

405_1 ENSG00000055609_ENST00000262189_KMT2C_mRNA 100.00 32 0 0 44 75 1172 1203 1.2e-08 60.2 405_1 MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA 100.00 22 0 0 23 44 1 22 0.0044 41.7

Result: no hybrid in the output.

To make sure the e-val is not a problem, I used a setting of hval=100.0 when running hyb. Presumably I would expect the second library to also provide a hybrid for the provided sequence given that there are compatible blast results in both cases. Is there some other selection criteria I am missing that would prevent outputting of a record in the second case? If not, why would I not expect a hybrid output here?

Thanks much!

— Reply to this email directly, view it on GitHub https://github.com/gkudla/hyb/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM3FBVN5Y6BIAPTZEUVT3DYJCS5JAVCNFSM6AAAAABAR6YX2WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTQMZYG42DQNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dstrib commented 6 months ago

Thanks very much for the reply, that would make sense. I presume the alignment quality would be ranked by e-value. Scanning the other blast results for the second database there aren't any with an evalue better than 1.2e-08, and all those with an equivalent score correspond to a SAM CIGAR string of 43S32M for the alignment (and there aren't any better alignments). So as far as I can tell there isn't a better alignment that is "overriding" the potential hybrid. Do you have other thoughts?

gkudla commented 6 months ago

Could there be something that overrides the microRNA part of the hybrid? I.e. with e-value between 1.2e-08 and 0.0044, and perhaps partially overlapping with the mRNA part?

Greg

On Tue, 12 Dec 2023 at 19:37, Daniel Stribling @.***> wrote:

Thanks very much for the reply, that would make sense. I presume the alignment quality would be ranked by e-value. Scanning the other blast results for the second database there aren't any with an evalue greater than 1.2e-08, and all correspond to a SAM CIGAR string of 43S32M for the alignment (and there aren't any better alignments). So as far as I can tell there isn't a better alignment that is "overriding" the potential hybrid. Do you have other thoughts?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

dstrib commented 6 months ago

There is an additional distant alignment to an mRNA with the same score that does not overlap the microRNA:

(CIGAR: 22M53S) 405_1    ENSG00000184992_ENST00000341446_BRI3BP_mRNA 100.00  22  0   0   1   22  5012    5033    0.0044  41.7
(CIGAR: 22S22M31S) 405_1    MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA    100.00  22  0   0   23  44  1   22  0.0044  41.7

However given that this is run in "mim" mode I wouldn't expect it to prefer the former?

dstrib commented 6 months ago

Here are all relevant sam/blast entries:

file.sam:405_1  0   ENSG00000055609_ENST00000679882_KMT2C_mRNA  1105    0   43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000682283_KMT2C_mRNA  955 255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000684550_KMT2C_mRNA  1315    255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000683616_KMT2C_mRNA  1017    255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000290523_ENST00000470054_UNKGENE_lncRNA  430 255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000681082_KMT2C_mRNA  1175    255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000262189_KMT2C_mRNA  1172    255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000682916_KMT2C_mRNA  106 255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000055609_ENST00000683490_KMT2C_mRNA  1172    255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000187172_ENST00000496773_BAGE2_transcribed-unprocessed-pseudogene    106 255 43S32M  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:32 YT:Z:UU
file.sam:405_1  256 ENSG00000184992_ENST00000341446_BRI3BP_mRNA 5012    255 22M53S  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:22 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:22 YT:Z:UU
file.sam:405_1  256 MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA    1   255 22S22M31S   *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:22 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:22 YT:Z:UU
file.sam:405_1  256 ENSG00000197536_ENST00000337752_IRF1-AS1_lncRNA 2077    255 20M55S  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000165813_ENST00000369287_CCDC186_mRNA    7089    255 6S20M49S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000286449_ENST00000668926_UNKGENE_lncRNA  3274    255 20M55S  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000165813_ENST00000648613_CCDC186_mRNA    7243    255 6S20M49S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000167554_ENST00000601151_ZNF610_mRNA 2497    255 2S20M53S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000153930_ENST00000682825_ANKFN1_mRNA 8550    255 20M55S  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000167554_ENST00000403906_ZNF610_mRNA 2695    255 2S20M53S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:20 YT:Z:UU
file.sam:405_1  256 ENSG00000253897_ENST00000517788_UNKGENE_transcribed-unprocessed-pseudogene  58019   255 6S19M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:19 YT:Z:UU
file.sam:405_1  256 ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene    35362   255 6S19M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:19 YT:Z:UU
file.sam:405_1  256 ENSG00000186591_ENST00000649897_UBE2H_mRNA  1531    255 6S19M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:19 YT:Z:UU
file.sam:405_1  256 ENSG00000269821_ENST00000597346_KCNQ1OT1_lncRNA 72241   255 22M53S  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:4T17   YT:Z:UU
file.sam:405_1  256 ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene    36333   255 6S19M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:19 YT:Z:UU
file.sam:405_1  256 ENSG00000186591_ENST00000355621_UBE2H_mRNA  1878    255 6S19M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:19 YT:Z:UU
file.sam:405_1  256 ENSG00000276521_ENST00000615334_UNKGENE_unprocessed-pseudogene  777 255 1S21M53S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:3G17   YT:Z:UU
file.sam:405_1  256 ENSG00000240438_ENST00000447585_OFD1P5Y_unprocessed-pseudogene  12245   255 2S18M55S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000242153_ENST00000451061_OFD1P6Y_transcribed-unprocessed-pseudogene  29535   255 2S18M55S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000271519_ENST00000603371_UNKGENE_processed-pseudogene    4271    255 2S18M55S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000233321_ENST00000685074_LINC02669_lncRNA    1632    255 2S18M55S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 MIMAT0000420_miRBase_hsa-miR-30b-5p_microRNA    1   255 22S18M35S   *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000106771_ENST00000374586_TMEM245_mRNA    6821    255 18M57S  *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000233963_ENST00000426792_ATP8A2P3_unprocessed-pseudogene 11220   255 7S18M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000006432_ENST00000554752_MAP3K9_mRNA 8182    255 2S18M55S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000175387_ENST00000262160_SMAD2_mRNA  12609   255 5S18M52S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000288025_ENST00000664814_UNKGENE_lncRNA  1693    255 2S18M55S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU
file.sam:405_1  256 ENSG00000255185_ENST00000534700_PDXDC2P_transcribed-unprocessed-pseudogene  16357   255 7S18M50S    *   0   0   AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:18 YT:Z:UU

file.blast:405_1    ENSG00000055609_ENST00000679882_KMT2C_mRNA  100.00  32  0   0   44  75  1105    1136    1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000682283_KMT2C_mRNA  100.00  32  0   0   44  75  955 986 1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000684550_KMT2C_mRNA  100.00  32  0   0   44  75  1315    1346    1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000683616_KMT2C_mRNA  100.00  32  0   0   44  75  1017    1048    1.2e-08 60.2
file.blast:405_1    ENSG00000290523_ENST00000470054_UNKGENE_lncRNA  100.00  32  0   0   44  75  430 461 1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000681082_KMT2C_mRNA  100.00  32  0   0   44  75  1175    1206    1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000262189_KMT2C_mRNA  100.00  32  0   0   44  75  1172    1203    1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000682916_KMT2C_mRNA  100.00  32  0   0   44  75  106 137 1.2e-08 60.2
file.blast:405_1    ENSG00000055609_ENST00000683490_KMT2C_mRNA  100.00  32  0   0   44  75  1172    1203    1.2e-08 60.2
file.blast:405_1    ENSG00000187172_ENST00000496773_BAGE2_transcribed-unprocessed-pseudogene    100.00  32  0   0   44  75  106 137 1.2e-08 60.2
file.blast:405_1    ENSG00000184992_ENST00000341446_BRI3BP_mRNA 100.00  22  0   0   1   22  5012    5033    0.0044  41.7
file.blast:405_1    MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA    100.00  22  0   0   23  44  1   22  0.0044  41.7
file.blast:405_1    ENSG00000197536_ENST00000337752_IRF1-AS1_lncRNA 100.00  20  0   0   1   20  2077    2096    0.058   38.1
file.blast:405_1    ENSG00000165813_ENST00000369287_CCDC186_mRNA    100.00  20  0   0   7   26  7089    7108    0.058   38.1
file.blast:405_1    ENSG00000286449_ENST00000668926_UNKGENE_lncRNA  100.00  20  0   0   1   20  3274    3293    0.058   38.1
file.blast:405_1    ENSG00000165813_ENST00000648613_CCDC186_mRNA    100.00  20  0   0   7   26  7243    7262    0.058   38.1
file.blast:405_1    ENSG00000167554_ENST00000601151_ZNF610_mRNA 100.00  20  0   0   3   22  2497    2516    0.058   38.1
file.blast:405_1    ENSG00000153930_ENST00000682825_ANKFN1_mRNA 100.00  20  0   0   1   20  8550    8569    0.058   38.1
file.blast:405_1    ENSG00000167554_ENST00000403906_ZNF610_mRNA 100.00  20  0   0   3   22  2695    2714    0.058   38.1
file.blast:405_1    ENSG00000253897_ENST00000517788_UNKGENE_transcribed-unprocessed-pseudogene  100.00  19  0   0   7   25  58019   58037   0.21    36.2
file.blast:405_1    ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene    100.00  19  0   0   7   25  35362   35380   0.21    36.2
file.blast:405_1    ENSG00000186591_ENST00000649897_UBE2H_mRNA  100.00  19  0   0   7   25  1531    1549    0.21    36.2
file.blast:405_1    ENSG00000269821_ENST00000597346_KCNQ1OT1_lncRNA 95.45   22  1   0   1   22  72241   72262   0.21    36.2
file.blast:405_1    ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene    100.00  19  0   0   7   25  36333   36351   0.21    36.2
file.blast:405_1    ENSG00000186591_ENST00000355621_UBE2H_mRNA  100.00  19  0   0   7   25  1878    1896    0.21    36.2
file.blast:405_1    ENSG00000276521_ENST00000615334_UNKGENE_unprocessed-pseudogene  95.24   21  1   0   2   22  777 797 0.74    34.4
file.blast:405_1    ENSG00000240438_ENST00000447585_OFD1P5Y_unprocessed-pseudogene  100.00  18  0   0   3   20  12245   12262   0.74    34.4
file.blast:405_1    ENSG00000242153_ENST00000451061_OFD1P6Y_transcribed-unprocessed-pseudogene  100.00  18  0   0   3   20  29535   29552   0.74    34.4
file.blast:405_1    ENSG00000271519_ENST00000603371_UNKGENE_processed-pseudogene    100.00  18  0   0   3   20  4271    4288    0.74    34.4
file.blast:405_1    ENSG00000233321_ENST00000685074_LINC02669_lncRNA    100.00  18  0   0   3   20  1632    1649    0.74    34.4
file.blast:405_1    MIMAT0000420_miRBase_hsa-miR-30b-5p_microRNA    100.00  18  0   0   23  40  1   18  0.74    34.4
file.blast:405_1    ENSG00000106771_ENST00000374586_TMEM245_mRNA    100.00  18  0   0   1   18  6821    6838    0.74    34.4
file.blast:405_1    ENSG00000233963_ENST00000426792_ATP8A2P3_unprocessed-pseudogene 100.00  18  0   0   8   25  11220   11237   0.74    34.4
file.blast:405_1    ENSG00000006432_ENST00000554752_MAP3K9_mRNA 100.00  18  0   0   3   20  8182    8199    0.74    34.4
file.blast:405_1    ENSG00000175387_ENST00000262160_SMAD2_mRNA  100.00  18  0   0   6   23  12609   12626   0.74    34.4
file.blast:405_1    ENSG00000288025_ENST00000664814_UNKGENE_lncRNA  100.00  18  0   0   3   20  1693    1710    0.74    34.4
file.blast:405_1    ENSG00000255185_ENST00000534700_PDXDC2P_transcribed-unprocessed-pseudogene  100.00  18  0   0   8   25  16357   16374   0.74    34.4
gkudla commented 6 months ago

Ok, there are two reasons why this hybrid is not detected:

  1. By default, hybrids are only detected for reads with at most 10 matches in the blast file. You can change this using the hmax setting on the command line, e.g. hmax=100
  2. For this particular read, hybrids are not detected because of this entry in the blast file:

405_1 ENSG00000184992_ENST00000341446_BRI3BP_mRNA 100.00 22 0 0 1 22 5012 5033 0.0044 41.7

This is because hyb decides that this is a hybrid between KMT2C_mRNA and BRI3BP_mRNA, but then rejects this hybrid because of the large gap in the read between KMT2C_mRNA and BRI3BP_mRNA. This has to do with the hyb algorithm and I have no simple fix (a workaround would be to delete BRI3BP from your mapping database).

hope that helps Greg

On Tue, 12 Dec 2023 at 21:10, Daniel Stribling @.***> wrote:

Here are all relevant sam/blast entries:

file.sam:405_1 0 ENSG00000055609_ENST00000679882_KMT2C_mRNA 1105 0 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000682283_KMT2C_mRNA 955 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000684550_KMT2C_mRNA 1315 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000683616_KMT2C_mRNA 1017 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000290523_ENST00000470054_UNKGENE_lncRNA 430 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000681082_KMT2C_mRNA 1175 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000262189_KMT2C_mRNA 1172 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000682916_KMT2C_mRNA 106 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000055609_ENST00000683490_KMT2C_mRNA 1172 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000187172_ENST00000496773_BAGE2_transcribed-unprocessed-pseudogene 106 255 43S32M 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:32 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU file.sam:405_1 256 ENSG00000184992_ENST00000341446_BRI3BP_mRNA 5012 255 22M53S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:22 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:22 YT:Z:UU file.sam:405_1 256 MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA 1 255 22S22M31S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:22 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:22 YT:Z:UU file.sam:405_1 256 ENSG00000197536_ENST00000337752_IRF1-AS1_lncRNA 2077 255 20M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000165813_ENST00000369287_CCDC186_mRNA 7089 255 6S20M49S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000286449_ENST00000668926_UNKGENE_lncRNA 3274 255 20M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000165813_ENST00000648613_CCDC186_mRNA 7243 255 6S20M49S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000167554_ENST00000601151_ZNF610_mRNA 2497 255 2S20M53S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000153930_ENST00000682825_ANKFN1_mRNA 8550 255 20M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000167554_ENST00000403906_ZNF610_mRNA 2695 255 2S20M53S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:20 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:20 YT:Z:UU file.sam:405_1 256 ENSG00000253897_ENST00000517788_UNKGENE_transcribed-unprocessed-pseudogene 58019 255 6S19M50S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU file.sam:405_1 256 ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene 35362 255 6S19M50S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU file.sam:405_1 256 ENSG00000186591_ENST00000649897_UBE2H_mRNA 1531 255 6S19M50S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU file.sam:405_1 256 ENSG00000269821_ENST00000597346_KCNQ1OT1_lncRNA 72241 255 22M53S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:4T17 YT:Z:UU file.sam:405_1 256 ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene 36333 255 6S19M50S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU file.sam:405_1 256 ENSG00000186591_ENST00000355621_UBE2H_mRNA 1878 255 6S19M50S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:19 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU file.sam:405_1 256 ENSG00000276521_ENST00000615334_UNKGENE_unprocessed-pseudogene 777 255 1S21M53S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:3G17 YT:Z:UU file.sam:405_1 256 ENSG00000240438_ENST00000447585_OFD1P5Y_unprocessed-pseudogene 12245 255 2S18M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000242153_ENST00000451061_OFD1P6Y_transcribed-unprocessed-pseudogene 29535 255 2S18M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000271519_ENST00000603371_UNKGENE_processed-pseudogene 4271 255 2S18M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000233321_ENST00000685074_LINC02669_lncRNA 1632 255 2S18M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 MIMAT0000420_miRBase_hsa-miR-30b-5p_microRNA 1 255 22S18M35S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000106771_ENST00000374586_TMEM245_mRNA 6821 255 18M57S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000233963_ENST00000426792_ATP8A2P3_unprocessed-pseudogene 11220 255 7S18M50S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000006432_ENST00000554752_MAP3K9_mRNA 8182 255 2S18M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000175387_ENST00000262160_SMAD2_mRNA 12609 255 5S18M52S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000288025_ENST00000664814_UNKGENE_lncRNA 1693 255 2S18M55S 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU file.sam:405_1 256 ENSG00000255185_ENST00000534700_PDXDC2P_transcribed-unprocessed-pseudogene 16357 255 7S18M50S * 0 0 AAAAAAGTGTGTGTGTGTGTATTGTAAACATCCTACACTCTCAGATTTCAGTCACATCTTCCTGCTTTGTCCAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:18 XS:i:32 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:18 YT:Z:UU

file.blast:405_1 ENSG00000055609_ENST00000679882_KMT2C_mRNA 100.00 32 0 0 44 75 1105 1136 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000682283_KMT2C_mRNA 100.00 32 0 0 44 75 955 986 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000684550_KMT2C_mRNA 100.00 32 0 0 44 75 1315 1346 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000683616_KMT2C_mRNA 100.00 32 0 0 44 75 1017 1048 1.2e-08 60.2 file.blast:405_1 ENSG00000290523_ENST00000470054_UNKGENE_lncRNA 100.00 32 0 0 44 75 430 461 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000681082_KMT2C_mRNA 100.00 32 0 0 44 75 1175 1206 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000262189_KMT2C_mRNA 100.00 32 0 0 44 75 1172 1203 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000682916_KMT2C_mRNA 100.00 32 0 0 44 75 106 137 1.2e-08 60.2 file.blast:405_1 ENSG00000055609_ENST00000683490_KMT2C_mRNA 100.00 32 0 0 44 75 1172 1203 1.2e-08 60.2 file.blast:405_1 ENSG00000187172_ENST00000496773_BAGE2_transcribed-unprocessed-pseudogene 100.00 32 0 0 44 75 106 137 1.2e-08 60.2 file.blast:405_1 ENSG00000184992_ENST00000341446_BRI3BP_mRNA 100.00 22 0 0 1 22 5012 5033 0.0044 41.7 file.blast:405_1 MIMAT0000244_miRBase_hsa-miR-30c-5p_microRNA 100.00 22 0 0 23 44 1 22 0.0044 41.7 file.blast:405_1 ENSG00000197536_ENST00000337752_IRF1-AS1_lncRNA 100.00 20 0 0 1 20 2077 2096 0.058 38.1 file.blast:405_1 ENSG00000165813_ENST00000369287_CCDC186_mRNA 100.00 20 0 0 7 26 7089 7108 0.058 38.1 file.blast:405_1 ENSG00000286449_ENST00000668926_UNKGENE_lncRNA 100.00 20 0 0 1 20 3274 3293 0.058 38.1 file.blast:405_1 ENSG00000165813_ENST00000648613_CCDC186_mRNA 100.00 20 0 0 7 26 7243 7262 0.058 38.1 file.blast:405_1 ENSG00000167554_ENST00000601151_ZNF610_mRNA 100.00 20 0 0 3 22 2497 2516 0.058 38.1 file.blast:405_1 ENSG00000153930_ENST00000682825_ANKFN1_mRNA 100.00 20 0 0 1 20 8550 8569 0.058 38.1 file.blast:405_1 ENSG00000167554_ENST00000403906_ZNF610_mRNA 100.00 20 0 0 3 22 2695 2714 0.058 38.1 file.blast:405_1 ENSG00000253897_ENST00000517788_UNKGENE_transcribed-unprocessed-pseudogene 100.00 19 0 0 7 25 58019 58037 0.21 36.2 file.blast:405_1 ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene 100.00 19 0 0 7 25 35362 35380 0.21 36.2 file.blast:405_1 ENSG00000186591_ENST00000649897_UBE2H_mRNA 100.00 19 0 0 7 25 1531 1549 0.21 36.2 file.blast:405_1 ENSG00000269821_ENST00000597346_KCNQ1OT1_lncRNA 95.45 22 1 0 1 22 72241 72262 0.21 36.2 file.blast:405_1 ENSG00000231752_ENST00000458200_EMBP1_transcribed-unprocessed-pseudogene 100.00 19 0 0 7 25 36333 36351 0.21 36.2 file.blast:405_1 ENSG00000186591_ENST00000355621_UBE2H_mRNA 100.00 19 0 0 7 25 1878 1896 0.21 36.2 file.blast:405_1 ENSG00000276521_ENST00000615334_UNKGENE_unprocessed-pseudogene 95.24 21 1 0 2 22 777 797 0.74 34.4 file.blast:405_1 ENSG00000240438_ENST00000447585_OFD1P5Y_unprocessed-pseudogene 100.00 18 0 0 3 20 12245 12262 0.74 34.4 file.blast:405_1 ENSG00000242153_ENST00000451061_OFD1P6Y_transcribed-unprocessed-pseudogene 100.00 18 0 0 3 20 29535 29552 0.74 34.4 file.blast:405_1 ENSG00000271519_ENST00000603371_UNKGENE_processed-pseudogene 100.00 18 0 0 3 20 4271 4288 0.74 34.4 file.blast:405_1 ENSG00000233321_ENST00000685074_LINC02669_lncRNA 100.00 18 0 0 3 20 1632 1649 0.74 34.4 file.blast:405_1 MIMAT0000420_miRBase_hsa-miR-30b-5p_microRNA 100.00 18 0 0 23 40 1 18 0.74 34.4 file.blast:405_1 ENSG00000106771_ENST00000374586_TMEM245_mRNA 100.00 18 0 0 1 18 6821 6838 0.74 34.4 file.blast:405_1 ENSG00000233963_ENST00000426792_ATP8A2P3_unprocessed-pseudogene 100.00 18 0 0 8 25 11220 11237 0.74 34.4 file.blast:405_1 ENSG00000006432_ENST00000554752_MAP3K9_mRNA 100.00 18 0 0 3 20 8182 8199 0.74 34.4 file.blast:405_1 ENSG00000175387_ENST00000262160_SMAD2_mRNA 100.00 18 0 0 6 23 12609 12626 0.74 34.4 file.blast:405_1 ENSG00000288025_ENST00000664814_UNKGENE_lncRNA 100.00 18 0 0 3 20 1693 1710 0.74 34.4 file.blast:405_1 ENSG00000255185_ENST00000534700_PDXDC2P_transcribed-unprocessed-pseudogene 100.00 18 0 0 8 25 16357 16374 0.74 34.4

— Reply to this email directly, view it on GitHub https://github.com/gkudla/hyb/issues/10#issuecomment-1852816432, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM3FBUBSF6PHNIVK6J575TYJDB4NAVCNFSM6AAAAABAR6YX2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJSHAYTMNBTGI . You are receiving this because you commented.Message ID: @.***>

dstrib commented 6 months ago

Thank you for this clarification.

Given that they have the same alignment e-value, is this due to BRI3BP_mRNA occurring before hsa-miR-30c-5p in the list of alignments?

Additionally, are there any intermediate files between the blast and initial hyb file where the selected segments could be identified?

gkudla commented 6 months ago

On Wed, 13 Dec 2023 at 14:03, Daniel Stribling @.***> wrote:

Thank you for this clarification.

Given that they have the same alignment e-value, is this due to BRI3BP_mRNA occurring before hsa-miR-30c-5p in the list of alignments?

possibly, yes.

Additionally, are there any intermediate files between the blast and initial hyb file where the selected segments could be identified?

No, there will be no hybrids reported in any of the intermediate files for that read

Reply to this email directly, view it on GitHub https://github.com/gkudla/hyb/issues/10#issuecomment-1853977252, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM3FBXC7NCGNCIMEJKBGBDYJGYSLAVCNFSM6AAAAABAR6YX2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTHE3TOMRVGI . You are receiving this because you commented.Message ID: @.***>