IEEE-VIT / templa-rs

One-Stop Solution for all boilerplate needs!
MIT License
28 stars 22 forks source link

Improve Searching Algorithm #2

Open mintbomb27 opened 2 years ago

mintbomb27 commented 2 years ago

Improve the searching algorithm present in perform_search().

Currently, the search works in an O(n) time complexity by comparing the input given by the -t flag with the tags array in the submodules.json.

Dinoosawruss commented 2 years ago

Hello! Are you assigning issues or should we just create a pull request once we've developed a solution?

If you are assigning could you please assign this to me 😄

mintbomb27 commented 2 years ago

Hey sure! Assigning.

Dinoosawruss commented 2 years ago

Ok brilliant

I have two main solutions currently

Solution 1 - Parallel Search The first solution I can think of is parallel searching them by finding the greatest factor of the length of Submodes and then searching equally through the array. This would not require the array to be sorted While this still O(n) it should in theory find the items faster

Issues

Pros

Solution 2 - Binary Search The second solution would be to sort the array and then use a binary search - the main issue with this is that the array would need to be sorted and as the tags were searching for are also arrays this complicates things a little bit This would be O(log(n))

Issues

Pros

Checklist Below is a checklist I'll just use to track what I need to do. For each of the solutions I'll code it up and test it out on a large data set to see what the time benefits of each is.

I will use this comment throughout the process to keep everything updated. If you have any comments or questions let me know - or if you have a better solution don't hesitate to suggest it

Dinoosawruss commented 2 years ago

Current System While the time complexity may be low due to the speed of Rust the current system is already extremely fast, my collected data is shown below:

Items Time (nano seconds) Time per item (nano seconds)
10 23400 (0.0234 milliseconds) 2340
100 145500 (0.1455 milliseconds) 1455
1000 704400 (0.7044 milliseconds) 704.4
Dinoosawruss commented 2 years ago

Alternative Linear Search While testing I also thought I'd consider an alternative linear search to the one used in the existing code. The code for this linear search is below:

let x = submodules.len();

let mut i = 0;

while i < x {
    if (tags.is_empty() || submodules[i].has_one_of_tags(tags)) && 
       (name_query.is_empty() || submodules[i].name.to_lowercase().contains(&lowercase_query)) 
    {
            filtered_sm.push(submodules[i].clone());
    } 

    i+=1;
}

Time Collections

Items Time (nano seconds) Time per item (nano seconds)
10 26600 ( milliseconds) 2660
100 82400 ( milliseconds) 824
1000 506600 ( milliseconds) 506.6

This alone is a significant time improvement and may even be better than other algorithms

mintbomb27 commented 2 years ago

@Dinoosawruss Great research! Loved the analysis. If the improved linear search algo is the best we can have as of now, then let's go for it! Really looking forward to your contribution :))

kugiyasan commented 2 years ago

@Dinoosawruss can I see how you tested? I tried to reproduce the same benchmarks, but my results were quite different, my times were much more linear when increasing the number of items.

running 6 tests
test alternate_linear_search_10   ... bench:       1,246 ns/iter (+/- 66)
test alternate_linear_search_100  ... bench:      12,016 ns/iter (+/- 941)
test alternate_linear_search_1000 ... bench:     120,654 ns/iter (+/- 9,336)
test perform_search_10            ... bench:       1,239 ns/iter (+/- 182)
test perform_search_100           ... bench:      12,307 ns/iter (+/- 3,357)
test perform_search_1000          ... bench:     122,835 ns/iter (+/- 8,086)

Here's the bench.rs I used

I ran the file with cargo bench rustc 1.57.0-nightly (11491938f 2021-09-29)

Dinoosawruss commented 2 years ago

@Dinoosawruss can I see how you tested? I tried to reproduce the same benchmarks, but my results were quite different, my times were much more linear when increasing the number of items.

running 6 tests
test alternate_linear_search_10   ... bench:       1,246 ns/iter (+/- 66)
test alternate_linear_search_100  ... bench:      12,016 ns/iter (+/- 941)
test alternate_linear_search_1000 ... bench:     120,654 ns/iter (+/- 9,336)
test perform_search_10            ... bench:       1,239 ns/iter (+/- 182)
test perform_search_100           ... bench:      12,307 ns/iter (+/- 3,357)
test perform_search_1000          ... bench:     122,835 ns/iter (+/- 8,086)

Here's the bench.rs I used

I ran the file with cargo bench rustc 1.57.0-nightly (11491938f 2021-09-29)

I simply timed how long each operation took, did this 50 times and took the average While this is a somewhat nieve approach I was not at the time aware of cargo bench