calico / scBasset

Sequence-based Modeling of single-cell ATAC-seq using Convolutional Neural Networks.
Apache License 2.0
89 stars 12 forks source link

Issues with motif scoring #21

Closed anu-bioinfo closed 5 months ago

anu-bioinfo commented 5 months ago

Hello!

I am working on sc-RNA and scATAC (unpaired and different in cell numbers) from mouse HSCs. While using the score_motif program I get the following error:

scores = motif_score('Gata1', model, motif_fasta_folder=motif_fasta_folder) ad.obs['Gata1_activity'] = scores

ValueError Traceback (most recent call last) /tmp/ipykernel_67769/4276181541.py in ----> 1 scores = motif_score('Gata1', model, motif_fasta_folder=motif_fasta_folder) 2 ad.obs['Gata1_activity'] = scores

/mnt/NewHDD1/software/scBasset/scbasset/utils.py in motif_score(tf, model, motif_fasta_folder, bc, scale_method) 475 fasta_bg = "%s/shuffled_peaks.fasta" % motif_fasta_folder 476 --> 477 pred_motif = pred_on_fasta(fasta_motif, model, bc=bc, scale_method='sigmoid') 478 pred_bg = pred_on_fasta(fasta_bg, model, bc=bc, scale_method='sigmoid') 479 tf_score = pred_motif.mean(axis=0) - pred_bg.mean(axis=0)

/mnt/NewHDD1/software/scBasset/scbasset/utils.py in pred_on_fasta(fa, model, bc, scale_method) 451 seqs = [str(i.seq) for i in records] 452 seqs_1hot = np.array([dna_1hot(i) for i in seqs]) --> 453 pred = imputation_Y_normalize(seqs_1hot, model, bc_model=bc, scale_method=scale_method) 454 return pred 455

/mnt/NewHDD1/software/scBasset/scbasset/utils.py in imputation_Y_normalize(X, model, bc_model, scale_method) 420 outputs=model.layers[-4].output, 421 ) --> 422 Y_pred = new_model.predict(X) 423 w = model.layers[-3].get_weights()[0] 424 intercepts = model.layers[-3].get_weights()[1]

/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/engine/training.py in tfpredict_function(iterator) 13 try: 14 doreturn = True ---> 15 retval = ag__.converted_call(ag.ld(step_function), (ag.ld(self), ag.ld(iterator)), None, fscope) 16 except: 17 do_return = False

ValueError: in user code:

File "/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/engine/training.py", line 2137, in predict_function  *
    return step_function(self, iterator)
File "/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/engine/training.py", line 2123, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/engine/training.py", line 2111, in run_step  **
    outputs = model.predict_step(data)
File "/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/engine/training.py", line 2079, in predict_step
    return self(x, training=False)
File "/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
File "/mnt/NewHDD/anaconda3/envs/scbasset/lib/python3.7/site-packages/keras/engine/input_spec.py", line 296, in assert_input_compatibility
    f'Input {input_index} of layer "{layer_name}" is '

ValueError: Input 0 of layer "model_7" is incompatible with the layer: expected shape=(None, 1344, 4), found shape=(None, 1500, 4)

Could you please helm me out here? I get the same error for any TF that I choose.

Best,

Anupam

hy395 commented 5 months ago

Hi Anupam,

The model works with 1344bp sequences as input. However, looks like the fasta in the motif_fasta_folder are 1500bp. Did you download the human fasta file that we shared or generated your own?

anu-bioinfo commented 5 months ago

Hello Han,

Thanks for your quick response. Yes, I created my own fasta folder for mouse. And I changed the seq length from 1344 to 1500. I will rerun the motif generation code again. Thanks again for your help.

Best,

Anupam