ERGA-consortium / EARs

The ERGA Assembly Reports repository
MIT License
11 stars 23 forks source link

QV is sometimes not parsed into the yaml #120

Open talioto opened 1 week ago

talioto commented 1 week ago

The QV is sometimes missing from the yaml. Seems to be a bug related to newline character(s). see https://genomes.cnag.cat/erga-stream/assemblies/ for a list of those missing QVs

diegomics commented 1 week ago

Thanks for pointing this out :) I see that these are all Sanger EARs. I was not able to replicate the issue, maybe the porting they are doing is adding a \n character or something? I will try to ask them to update to the last version of the make_EAR.py script. There is one non-Sanger EAR (#72), from Genoscope, but they have the QV and Kcomp from the precuration asm missing, so it's a different issue. This is something that should have been pointed out by the reviewer and/or supervisor, but since this is from the pre-curation asm, I think it is not a big deal. I will update the pdf/yaml parser for handling this case and the new line. Also, I detected a few other parsing improvements that I can implement