v0.4.0

💥 Breaking

Changed the input from a path to a FASTQ file to a path to a directory: The output of Guppy is now stored in multiple FASTQ files under the barcodeXX/ directory. Previously, it was necessary to combine the FASTQ files in the barcodeXX/ directory into one and specify it as an argument. With this revision, it is now possible to directly specify the barcodeXX directory, allowing users to seamlessly proceed to DAJIN2 analysis after Guppy processing. Commit Detail

📝 Documentation

Changed conda config --set channel_priority strict to conda config --set channel_priority flexible for installation process in TROUBLESHOOTING.md. Commit Detail

🚀 New Features

Apple Silicon (ARM64) supoorts. Commit Detail
Changed the definition of the minor allele from a read number of less than or equal to 10 to less than or equal to 5. This is based on the assumption that one sample contains 1000 reads, where 0.5% corresponds to 5 reads. Commit Detail

🔧 Update

Update preprocess.insertion_to_fasta to facilitate the discrimination of Insertion alleles, the Reference for Insertion alleles has been saved in FASTA/HTML directory. Commit Detail
Update insertions_to_fasta.extract_enriched_insertions: Previously, it calculated the presence ratio of insertion alleles separately for samples and controls, filtering at 0.5%. However, due to a threshold issue, some control insertions were narrowly missing the threshold, resulting in them being incorrectly identified as sample-specific insertions. To rectify this, the algorithm now clusters samples and controls together, excluding clusters where both types are mixed. This modification allows for the extraction of sample-specific insertion alleles. Commit Detail
Updated preprocess.insertions_to_fasta.count_insertions of the counting method to treat similar insertions as identical. Previously, the same insertion was erroneously counted as different ones due to sequence errors. Commit Detail
Updated preprocess.insertions_to_fasta.merge_similar_insertions: Previously, clustering was done using MiniBatchKMeans, but this method had an issue where it excessively clustered when only highly similar insertion sequences existed. Therefore, a strategy similar to extract_enriched_insertions was adopted, changing the algorithm to one that mixes with a uniform distribution of random scores before clustering. Commit Detail
Added preprocess.insertions_to_fasta.clustering_insertions: Combined the clustering methods used in extract_enriched_insertions and merge_similar_insertions into a common function. Commit Detail
Moved the call_sequence function to the cssplits_handler module. Commit Detail

🐛 Bug Fixes

Debug clustering.merge_labels to be able to correctly revert minor labels back to parent labels. Commit Detail
Updated utils.input_validator.validate_genome_and_fetch_urls to obtain available_server more explicitly. Previously, it relied on HTTP response codes, but there were instances where the UCSC Genome Browser showed a normal (200) response while internally being in error. Therefore, with this change, a more explicit method is employed by searching for specific keywords present in the normal HTML, to determine if the server is functioning correctly. Commit Detail
Added config.reset_logging to reset the logging configuration. Previously, when batch processing multiple experiment IDs (names), a bug existed where the log settings from previous experiments remained, and the log file name was not updated. However, with this change, log files are now created for each experiment ID. Commit Detail
Debugged core.py: Modified the specification of paths_predefined_fasta to accept input from user-entered ALLELE data. Previously, it accepted fasta files stored in the fasta directory. However, this approach had a bug where fasta files left over from a previously aborted run (which included newly created insertions) were treated as predefined. This resulted in new insertions being incorrectly categorized as predefined. Commit Detail

akikuno / DAJIN2

Develop 0.4.0 #20

v0.4.0

💥 Breaking

📝 Documentation

🚀 New Features

🔧 Update

🐛 Bug Fixes