calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
410 stars 126 forks source link

No checkpoints found. #147

Open wittney-m opened 1 year ago

wittney-m commented 1 year ago

Hi,

I am using basenji_train.py on my own data to train the model on a shared cluster. I am getting this output.

None model_strides [128] target_lengths [1024] target_crops [0] No checkpoints found. Traceback (most recent call last): File "/home/miniconda3/envs/basenji1/bin/basenji_train.py", line 182, in main() File "/home/miniconda3/envs/basenji1/bin/basenji_train.py", line 174, in main seqnn_trainer.fit_tape(seqnn_model) File "/home/miniconda3/envs/basenji1/lib/python3.8/site-packages/basenji/trainer.py", line 484, in fit_tape x, y = safe_next(train_iter) File "/home/miniconda3/envs/basenji1/lib/python3.8/site-packages/basenji/trainer.py", line 793, in safe_next d = next(data_iter) File "/home/miniconda3/envs/basenji1/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 766, in next return self._next_internal() File "/home/miniconda3/envs/basenji1/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 749, in _next_internal ret = gen_dataset_ops.iterator_get_next( File "/home/miniconda3/envs/basenji1/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 3017, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "/home/miniconda3/envs/basenji1/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7209, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node wrappedIteratorGetNext_output_types_2device/job:localhost/replica:0/task:0/device:CPU:0}} Input to reshape is a tensor with 18432 values, but the requested shape has 18360 [[{{node Reshape}}]] [Op:IteratorGetNext]

What does this error mean? How can I run this without error?

davek44 commented 1 year ago

This error indicates a mismatch between the training data shape and the code's expectation. Can you show me the command you ran to generate the dataset and the contents of the statistics.json file?

wittney-m commented 1 year ago

Hi Dave,

Thank you for responding. Below is output of the statistics.json file. { "num_targets": 18, "seq_length": 131072, "seq_1hot": true, "pool_width": 128, "crop_bp": 256, "target_length": 1020, "train_seqs": 21, "valid_seqs": 1, "test_seqs": 1 } Below is the command used to receive the output above.

! python3 /home//miniconda3/envs/basenji1/bin/basenji_data.py --restart --crop 256 --local -s .1 -o banseji_output -p 8 -t .1 -v .1 -w 128 Aspergillus_niger.ASM285v2.dna.toplevel.fa targets_file1.txt

wittney-m commented 1 year ago

I would also like to note that I copied the targets_file on the tutorial. I unsure what the sum column represents.

index identifier file clip sum_stat descripton 0 SRR10193405 SRR10193405.bw 384 sum delprtT2 1 SRR10193406 SRR10193406.bw 384 sum delprtT1 2 SRR10193407 SRR10193407.bw 384 sum delamyR2 3 SRR10193408 SRR10193408.bw 384 sum delamyR1 4 SRR10193409 SRR10193409.bw 384 sum SH2MPY2 5 SRR10193410 SRR10193410.bw 384 sum SH2MPY1 6 SRR10193411 SRR10193411.bw 384 sum SH2DPY2 7 SRR10193412 SRR10193412.bw 384 sum SH2DPY1 8 SRR10193413 SRR10193413.bw 384 sum genome_2 9 SRR10193414 SRR10193414.bw 384 sum genome_1 10 SRR10193415 SRR10193415.bw 384 sum delcreA2 11 SRR10193416 SRR10193416.bw 384 sum delcreA1 12 SRR10193417 SRR10193417.bw 384 sum delpacC2 13 SRR10193418 SRR10193418.bw 384 sum delpacC1 14 SRR10193419 SRR10193419.bw 384 sum delcpcA2 15 SRR10193420 SRR10193420.bw 384 sum delcpcA1 16 SRR10193421 SRR10193421.bw 384 sum CBS513two 17 SRR10193422 SRR10193422.bw 384 sum CBS513one