PacificBiosciences / pb-human-wgs-workflow-wdl

BSD 3-Clause Clear License
12 stars 9 forks source link

Make empty bai file with text #72

Closed vsmalladi closed 1 year ago

vsmalladi commented 1 year ago

@williamrowell ready to review @byoo I think this will solve it based on the error I think cromwell couldn't see an empty file.

williamrowell commented 1 year ago

Ready to test @byoo

byoo commented 1 year ago

@vsmalladi, we prefer to use a read only mount for sequencing data because it allows streaming and data doesn't need to be cached as much. Wouldn't this change require the mount to be read-write? I wonder if it will take a considerable effort not to use the indexed data type.

vsmalladi commented 1 year ago

@byoo we are not indexing the data. Just creating a dummy file.

It will be considerable effort to change from indexed data type since downstream process is expecting abam in that struct format. Since we intend to reverse this soon once the fix is in.

byoo commented 1 year ago

@vsmalladi, ok, it would be fine for the time being. Thanks!

byoo commented 1 year ago

@vsmalladi, the fix didn't work. It may have created the empty file in the mount in the node, but the empty file is not created in the container for the input uBAM file. If the dummy index doesn't have to be in the same container as uBAM, can it be created in the output dir of the align_ubam_or_fastq task?

@williamrowell, do you know what the status of the issue that pbmm2 alignment misses small fraction of input reads is?

vsmalladi commented 1 year ago

@byoo not sure I understand whats happening. The pbmm2 task should have both the ubam and abam objects as outputs. Then this is based to each subsequent task. Can you send the error?

byoo commented 1 year ago

@vsmalladi, the error is same. java.io.FileNotFoundException: Could not process output, file not found: on .bai files. Let me describe in more detail. This change introduced this code to create the empty file.

# Make temp ubam index echo "empty file" > /cromwell-executions/trial/e25a3d53-4be4-42bb-a118-6c6111c09cab/call-smrtcells_trial/smrtcells_trial/3c9c3952-0315-4c23-8e98-763053f81f10/call-smrtcells_affected_person/shard-2/smrtcells_person/b877fb5e-96bf-4742-813b-7129bdf76a07/call-smrtcells/shard-0/smrtcells/8b386fa1-272b-45b0-bb01-3df054398032/call-align_ubam_or_fastq/inputs/[Azure BLOB container path]/m64223e_220619_070425.hifi_reads.bam.bai ) > "$out8b386fa1" 2> "$err8b386fa1"

The command run with exit code 0 and I assume the empty file is created in the input path. But the file is not created in the container that Cromwell's compute node for the task mounts/copies data from. [Azure BLOB container]/m64223e_220619_070425.hifi_reads.bam.bai

Creating a file in the input path does not look to synchronize to the container it mounts/transfers data from. And, Cromwell is checking the existence of the bai files in the container that's input to the the workflow. Does it make sense? If you like to have any further info or logs, please feel free to let me know. Thanks!

vsmalladi commented 1 year ago

@byoo I see the error. It is how i refered to the file.

Fixing it.