Hi! I'm trying to reproduce the prediction matrices in the tutorial.
import os
import json
import subprocess
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pysam
from basenji import dataset, dna_io, seqnn
import tensorflow as tf
I unfortantely could not install and import cooltools. This is my workaround for that issue. I copied the set_diag function directly from cooltools source code
def set_diag(arr, x, i=0, copy=False):
if copy:
arr = arr.copy()
start = max(i, -arr.shape[1] * i)
stop = max(0, (arr.shape[1] - i)) * arr.shape[1]
step = arr.shape[1] + 1
arr.flat[start:stop:step] = x
return arr
def from_upper_triu(vector_repr, matrix_len, num_diags):
z = np.zeros((matrix_len,matrix_len))
triu_tup = np.triu_indices(matrix_len,num_diags)
z[triu_tup] = vector_repr
for i in range(-num_diags+1,num_diags):
set_diag(z, np.nan, i)
return z + z.T
I initially tried to use my own local reference .fasta file for chromosome 15 and tried to segment a region
Hi! I'm trying to reproduce the prediction matrices in the tutorial.
I then loaded in your model
I unfortantely could not install and import cooltools. This is my workaround for that issue. I copied the set_diag function directly from cooltools source code
I initially tried to use my own local reference .fasta file for chromosome 15 and tried to segment a region
This didn't end up working at the prediction stage, so I overwrote this sequence using the fasta provided
When I print out a bit of the
seq
I get: CAAAAACAAAAACTCCCTTCTGACCGCTGCCTTACTCAAG.... the resultingseq_1hot
aligns with this sequence.I then used the following, using the values from the json, to set up the reshaping of the array
This is consistently outputting the following, as I was expecting: flattened representation length: 99681 symmetrix matrix size: (448,448)
I then tried to visualize
I've attached the resulting plot I get.
I initially thought it was from the reshaping/cooltools issue, so I checked mat
mat
The output is as follows and is definitely not what I was expecting:
array([[ nan, nan, -0.22503351, ..., -50.06995773, -50.18245697, -50.29498291], [ nan, nan, nan, ..., -49.95742035, -50.06995773, -50.18245697], [ -0.22503351, nan, nan, ..., -49.84490967, -49.95742035, -50.06995773], ..., [-50.06995773, -49.95742035, -49.84490967, ..., nan, nan, -0.22503351], [-50.18245697, -50.06995773, -49.95742035, ..., nan, nan, nan], [-50.29498291, -50.18245697, -50.06995773, ..., -0.22503351, nan, nan]])
Then I looked at test_pred_from_seq[:,:, 0] itself
test_pred_from_seq[:,:, 0]
array([[-0.2250335 , -0.33755007, -0.450067 , ..., -0.2250335 , -0.33755007, -0.2250335 ]], dtype=float32)
I don't believe this is correct either. Please let me know if I've made any fatal errors. I can't seem to track my mistake.