Closed kevinrue closed 5 years ago
I can recreate the issue and it indeed relates to an empty file, I have added a check for empty files. here is the branch: https://github.com/cgat-developers/cgat-flow/tree/AC-fix-table-load. Can you check to see if you have any issues. I think the touch may have to be replaced with a sqlite command to create an empty table but I will do this once you have checked that the issue is fixed
Thanks @Acribbs
In short: it seems to work.
In long: loadFastQC
completes, although as your pointed out, no table is generated, which may cause an issue downstream. I'll run the rest of the pipeline and report back here.
Again, as you said, it probably just needs an sqlite statement that makes a table with the right header but no record.
Cheers!
no worries @kevinrue, I will add the sqlite command then if it passing and then push.
@kevinrue creating empty tables in sqlite isn't supported, do you have a suggestion as to what I can add as a filler?
Thinking out loud, here is a sample record from a data set that has a non empty table:
sqlite> SELECT * FROM fastqc_overrepresented_sequences limit 1;
track|Count|Duplication Level|Percentage|Percentage of deduplicated|Percentage of total|Possible Source|Sequence
Benoist-GSE92597-Aire-ChIP-IgG-IlluminaMiSeq-GSM2433236-SRR5122567|27945||0.267658800112753|||No Hit|ACTTCCAGGGATTTATAAGCCGATGACGTCATAACATCCCTGACCCTTTA
Can we make up a dummy record that has NA
s everywhere? If variable typing is any important, we can put "na" and 0 in the right places, e.g.
track|Count|Duplication Level|Percentage|Percentage of deduplicated|Percentage of total|Possible Source|Sequence
na|0||0|||No Hit|na
?
Not useful: From a brief look at the FastQC HTML report, when there are no OR sequences, they replace the table by the statement "No overrepresented sequences".
EDIT: what I suggest is having a dummy table packaged in the CGAT pipeline docs, that mimics the output of FastQC overrepresented sequences but only has a single row with NA
s and 0
s everywhere, and would be served to the ... | python -m cgatcore.csv2db --retry --database-url=sqlite:///./csvdb --add-index=track --table=fastqc_overrepresented_sequences
when fastqc_overrepresented_sequences.tsv.gz
is empty
Hi @kevinrue can you have a look at the change I made, im a little worries that we might get database locks because of the ruffus sending two statements at the same time. if you see this issue then maybe we may have to have think about how to overcome this
Yep. Sorry for the delay: I just tried and it works: the table appears in csvdb
and the dummy row is there too.
No issue about database lock as far as I can see.
Thanks!
ok cool, thanks. If you experience a database lock going forward let me know and I will add a random pause to the statement
Unfortunately the empty table command implemented in #70 does not seem to overwrite the table creating a "table already exists" if the pipeline is rerun.
Fixed.
Hi,
The following statement fails:
As an attempt to debug, I’ve just done the whole:
Same error.
For the record:
So one should be able to replicate the issue by creating a dummy file that only contains the header above, and run (in the cgat-f conda environment):
Best, kevin