compbiocore / qcdb

QC Database
0 stars 1 forks source link

Library read type #10

Closed JRWallace closed 4 years ago

JRWallace commented 5 years ago

Should we remove library read type from the metadata table? As it stands now, we run into an issue when trying to parse picard alignment metrics for PE data -- for example, should the alignment information be alloted to SRS_SRX_1 or _2? We suggest getting rid of library_read_type and instead having each sample ID be only SRS_SRX and then including both read 1 and read 2 data in the same .json file. Then, we will import the SRA metadata for each experiment and that will help us determine the read type. This might require us to strongly suggest the users specify if the data they are pulling QC data from is PE or SE. @ashokrags

aguang commented 5 years ago

@JRWallace what did we decide on this?

aguang commented 4 years ago

We are also running into an additional issue, which is that for the FASTQC files, which are in the format SRS_SRX_1_fastqc.zip and SRS_SRX_2_fastqc.zip, those will have db_id SRS_SRX_1 and SRS_SRX_2 inserted into the database, while for picard alignment metrics the db_id will be SRS_SRX_se since the files are in the format SRS_SRX_metric.txt. What we actually want is to find picard alignment and fastqc and any other metrics for the same SRS_SRX combo. As such we should go with db_id being only the unique combination of SRS_SRX rather than also including the library layout.