Clinical-Genomics / microSALT

Microbial Sequence Analysis and Loci-based Typing pipeline for use on NGS WGS data.
GNU General Public License v3.0
2 stars 3 forks source link

Add reference genome size to typing report #142

Closed henningonsbring closed 2 years ago

henningonsbring commented 3 years ago

Description

Adding a row to the typing report with the size of the reference genome given by the customer. This will allow a comparison between the assembly size and the size of the reference.

Also this PR fixes an error in the installation of microSALT.

Primary function of PR

Testing

If the update is a hotfix, it is sufficient to rely on the development testing along with the Travis self-test automatically applied to the PR.

Test routine to verify the stability of the PR:

_Verify that the results for projects MIC3109, MIC4107, MIC4109 & ACC5551 are consistent with the results attached to AMSystem doc 1490, MicrobialWGS.xlsx

Test results

These are the results of the tests, and necessary conclusions, that prove the stability of the PR.

Sign-offs

henningonsbring commented 2 years ago

The length of the S. aureus reference is 2729352 bp:

cat /home/proj/production/microbial/references/genomes/AP017922.1.fasta | grep -v -E "^>" | tr -d '\n' | wc -c
2729352
henningonsbring commented 2 years ago

"Referensens genomstorlek" is now displayed in the typing report (and it matches the size identified with the wc -c pipe):

Screen Shot 2021-09-28 at 15 25 25
henningonsbring commented 2 years ago

New column for reference genome length in /home/proj/stage/microbial/meta/microsalt.db added with the following script (using the following command: python /home/henning.onsbring/db_scripts/add_column_sqlite.py -c reference_length -t samples -d /home/proj/stage/microbial/meta/microsalt.db -y INTEGER)

# Add a column to a SQLite database
import sqlite3

from argparse import ArgumentParser

parser = ArgumentParser()

parser.add_argument("-c", "--column", dest="column",
                    help="Name of column",
                    metavar="<string>", required=True)
parser.add_argument("-t", "--table", dest="table",
                    help="Name of table",
                    metavar="<string>", required=True)
parser.add_argument("-d", "--database", dest="database",
                    help="Path to database",
                    metavar="<path>", required=True)
parser.add_argument("-y", "--type", dest="type",
                    help="What is stored in the column",
                    metavar="<string>", required=True)

args = parser.parse_args()

# Create a connection object
connection  = sqlite3.connect(args.database)

# Get a cursor
cursor = connection.cursor()

# Add the column to the table
addColumn = "ALTER TABLE %s ADD COLUMN %s %s" % (args.table, args.column, args.type)
cursor.execute(addColumn)

print("Added column %s to table %s with the type %s" % (args.column, args.table, args.type))

# close the database connection
connection.close()
talnor commented 2 years ago

One question about the python script. Does it add None values for existing samples in the database or how does it work? :)

henningonsbring commented 2 years ago

One question about the python script. Does it add None values for existing samples in the database or how does it work? :)

The script creates an empty column, it does not add something there. It should be the default value, but I am not sure what the default value is. That can be checked by looking into the database /home/proj/stage/microbial/meta/microsalt.db, however, every time you run a microsalt analysis for a ticket you will overwrite what ever value is in that column.

henningonsbring commented 2 years ago

ran single sample: cg workflow microsalt start -s ACC8231A4

Screen Shot 2021-12-07 at 17 08 46

print screen show how reference genome size is displayed in the typing report after analysing single samples

henningonsbring commented 2 years ago

@talnor time to merge this? it is working now, both for tickets and single samples. Since the new PR is working, also obviously the install-script is working. I have also rephrased a warning message based on your review.

henningonsbring commented 2 years ago

Ran on hasta:

python /home/henning.onsbring/db_scripts/add_column_sqlite.py -c reference_length -t samples -d /home/proj/production/microbial/meta/microsalt.db -y INTEGER
Added column reference_length to table samples with the type INTEGER