byu-dnasc / proto-smrtlink-share

0 stars 1 forks source link

Testing #30

Open adknaupp opened 2 months ago

adknaupp commented 2 months ago

Test environment

There are several prerequisites to testing smrtlink-share, depending on the level of functionality to be tested. The following is a comprehensive list of what components make up the testing environment:

Testing dataset

Since the dataset and project modules rely only on access to dataset XML files, these modules can be tested on any ORC machine. They can also be tested on a machine where smrtlink-container has been installed and used to download dummy dataset XMLs and their dependencies, but in this case only Revio-imported datasets are readily available (see below).

Revio imports

Most dataset files are uploaded to group storage by the Revio. In the example below, r84100_20240417_004514 is a directory for a run, and 1_A01 and 1_B01 are subdirectories for two of the cells which were run. These subdirectories each contain 5 subdirectories.

r84100_20240417_004514
|-- 1_A01
|   |-- fail_reads
|   |-- hifi_reads
|   |-- metadata
|   |-- pb_formats
|   `-- statistics
|-- 1_B01
...

Using Revio files for testing

The pb_formats directory will contain dataset XML files that can be used to test the app. These XML files reference files found in the other four subdirectories (fail_reads, hifi_reads, metadata, and statistics). Luckily, the XML files reference other 'external resource' files using relative paths, meaning that you can copy all five directories together to any location and the XML files they contain will still be valid (i.e., pbcore will not raise errors when files referenced in an XML are not found on the paths they were supposed to be located on).

Transplanting datasets

For the reasons discussed above, dataset XML files created by the Revio and their associated files are suitable to be "transplanted" from the system where they originated to some other system where we would like to do our testing. Scripts are available in the smrtlink-container repository which automate this process.

Analysis datasets

Dataset files created by the Revio are only one flavor of datasets that could be part of a project in SMRT Link (albeit the most common). The other main category of dataset found in SMRT Link are those generated by SMRT Analysis jobs. For logistic purposes, one of the main differences between analysis datasets and Revio-imported datasets is that files referenced by analysis datasets cannot be easily "transplanted" between the system where they originated and another system where we would like to do our testing. Therefore, another approach should be developed so that analysis datasets can be used in testing.

Running smrtlink-share testing in the production files.

All SMRT Link files are available to members of the fslg_dnasc file sharing group. Therefore, if you can log in to BYU Office of Research computing's system, then you could test the dataset module using hard-coded XML file paths to analysis datasets (or any dataset for that matter).