databio / gtars

Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
3 stars 2 forks source link

Core utilities #3

Closed nleroy917 closed 1 year ago

nleroy917 commented 1 year ago

I am starting to work on some core utils/structs/traits that will probably be useful across modules inside here.

donaldcampbelljr commented 1 year ago

I had difficulties getting hinting to work and Cargo was giving me an error. It appears as though its due to the fact that there is no Cargo.toml in the parent directory. I had to detach and then reattach the Cargo.toml in genimtools/genimtools/Cargo.toml

nleroy917 commented 1 year ago

Hmm are you using Rust rover? I get around this in VSCode by specifying where my Cargo.tomls are in .vscode/settings.json, but maybe this isn't available in Rust Rover

donaldcampbelljr commented 1 year ago

Yeah, it looks like it is related to RustRover: https://youtrack.jetbrains.com/issue/RUST-12231/RustRover-keeps-showing-Module-declaration-missing-warning

nleroy917 commented 1 year ago

Got it. I can also walk us all through it on Wednesday, and then use suggestions made there as the code review

donaldcampbelljr commented 1 year ago

Great. I'd like some sample files to test (bedfiles and a universe). I'm currently attempting to use the tokenize command on a couple of random bed files and running into an error:

drc@databio:~/GITHUB/genimtools/genimtools$ cargo run tokenize --bed /home/drc/Downloads/Example_Bed_Files/DRX129068.05.bed --universe /home/drc/Downloads/Example_Bed_Files/ERX1773918.05.bed
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/genimtools tokenize --bed /home/drc/Downloads/Example_Bed_Files/DRX129068.05.bed --universe /home/drc/Downloads/Example_Bed_Files/ERX1773918.05.bed`
thread 'main' panicked at src/tokenizers/cli.rs:51:48:
Failed to read bed file: ComputeError(ErrString("found more fields than defined in 'Schema'\n\nConsider setting 'truncate_ragged_lines=true'."))

Bedfile 1 (abbreviated):

chr1    629142  631400  DRX129068.05_peak_1 535 .   2.29236 59.20779    53.55708    447
chr1    631750  633052  DRX129068.05_peak_2 419 .   2.10060 47.34952    41.90761    392
chr1    633474  634797  DRX129068.05_peak_3 374 .   2.04966 42.78794    37.46230    702

Bedfile 2 (abbreviated):

chr1    629256  630027  ERX1773918.05_peak_1    910 .   4.81171 97.91111    91.00097    560
chr1    630770  631362  ERX1773918.05_peak_2    312 .   3.06170 37.29456    31.22379    461
chr1    631875  632306  ERX1773918.05_peak_3    524 .   3.73832 58.89972    52.49679    165
nleroy917 commented 1 year ago

Oh interesting - It seems like I might need to set truncate_ragged_lines=true to ignore extra fields. thank you for discovering this!