AgResearch / gbs_prism

refactored GBS processing
0 stars 1 forks source link

port keyfile sanitising to gquery #45

Open afmcc opened 8 months ago

afmcc commented 8 months ago

formatting problems with externally supplied ("walkin" ) keyfiles, such as ragged ends and non-ascii characters, was previously handled by the script sanitiseKeyFile.py in this repo, which was called by importOrUpdateKeyfile.sh. But that bash script has been deprecated as the keyfile import now handled by gquery/gupdate. So need to port code from sanitiseKeyFile.py into the illumina.py module of gupdate , around line 1797 - i.e. this code :

    with open(get_predicate("walkins_file"),"r") as walkins:
        walkin_columns = None
        walkin_records = []
        for rec in walkins:
            if walkin_columns is None:
                walkin_columns = [ item.lower() for item in re.split("\t",rec.strip()) ]
            else:
                fields = re.split("\t",rec)
                if len(fields) != len(walkin_columns):
                    raise illumina_sequencing_exception("number of fields in in header of walkins file %s (%d) is not always the same as in the rest of the file (%d)"%
                                                        ( get_predicate("walkins_file"), len(walkin_columns),len(fields)))  
                walkin_records.append(dict(zip(walkin_columns, fields)))
afmcc commented 8 months ago

patch is so we don't see this kind of thing . . ..

running gupdate --explain -t create_gbs_keyfiles -p "fastq_folder_root=/dataset/2023_illumina_sequencing_c/scratch/postprocessing/illumina/novaseq;run_folder_root=/dataset/2023_illumina_sequencing_c/active;out_folder=/dataset/hiseq/active/key-files;sample_sheet=/dataset/2023_illumina_sequencing_c/active/240109_A01439_0232_AHNGHFDRX3/SampleSheet.csv;import" all

!!!*****!!!

* oops something went wrong :(

* The original exeption encountered is below. To help debug the

* problem, a log of this session is here :

* /dataset/genophyle_data/scratch/gupdate/all-job.7.log

!!!*****!!!

Traceback (most recent call last): File "/dataset/gseq_processing/active/bin/gquery/gupdate.py", line 378, in sys.exit(main()) File "/dataset/gseq_processing/active/bin/gquery/gupdate.py", line 324, in main illumina.illumina(s).create_gbs_keyfiles() File "/bifo/active/gseq_processing/bin/gquery/sequencing/illumina.py", line 127, in create_gbs_keyfiles platform.create_gbs_keyfiles() File "/bifo/active/gseq_processing/bin/gquery/sequencing/illumina.py", line 1685, in create_gbs_keyfiles columns=self.create_gbs_keyfile(parameters_dict, key_path, append = False) File "/bifo/active/gseq_processing/bin/gquery/sequencing/illumina.py", line 1726, in create_gbs_keyfile walkin_columns = self.create_or_append_external_gbs_keyfile(predicates, key_path, append_existing) File "/bifo/active/gseq_processing/bin/gquery/sequencing/illumina.py", line 1771, in create_or_append_external_gbs_keyfile ( get_predicate("walkins_file"), len(walkin_columns),len(fields))) sequencing.illumina.illumina_sequencing_exception: number of fields in in header of walkins file /dataset/hiseq/active/key-files/SQ3047.txt (15) is not always the same as in the rest of the file (27)

sorry - quitting after received bad return code from database import -try looking at the log file shown above