Open ArthurMPassos opened 1 month ago
Here a script that generate some files to use as input to reproduce.
import numpy as np
path = "../datasets/"
def generate_uniform_dataset(size, low, high):
"""
Generate a uniformly distributed dataset.
"""
return np.random.uniform(low, high, size).astype(int)
def generate_random_dataset(size, low, high):
"""
Generate a randomly distributed dataset.
"""
return np.random.randint(low, high, size)
def generate_skewed_dataset(size, low1, high1, low2, high2, skew_ratio=0.8):
"""
Generate a skewed dataset with the specified ratio.
"""
size_majority = int(size * skew_ratio)
size_minority = size - size_majority
majority_part = np.random.randint(low1, high1, size_majority)
minority_part = np.random.randint(low2, high2, size_minority)
return np.concatenate([majority_part, minority_part])
def save_dataset(filename, dataset):
"""
Save dataset to a file in the format [1, 2, 3, 4].
"""
with open(filename, 'w') as f:
f.write(str(len(dataset)) +'\n' + '\n'.join(map(str, dataset)))
if __name__ == "__main__":
ARRAY_SIZE = 2**16
# Large random dataset
large_random_dataset = generate_random_dataset(ARRAY_SIZE , 0, 2**19) # 524288
save_dataset(path + "lines_random.txt", large_random_dataset)
# Large skewed dataset
large_skewed_dataset = generate_skewed_dataset(ARRAY_SIZE, 0, 1000, 1001, 1000000)
save_dataset(path + "lines_skewed.txt", large_skewed_dataset)
# Generate and save sorted datasets
sorted_dataset = np.sort(large_random_dataset)
save_dataset(path + "lines_sorted.txt", sorted_dataset)
# Generate and save reverse sorted datasets
reverse_sorted_dataset = sorted_dataset[::-1]
save_dataset(path + "lines_reverse.txt", reverse_sorted_dataset)
print("Datasets generated and saved to files.")
I'll have to test your program later, but some things stand out to me.
First off, IO is indeed very slow right now, but it should be linear with the number of operations (counting each read byte, each argument passed, etc).
The segfault sounds like some problem in the HVM interface with the OS.
I find it surprising that your program works as you expected, because it has some type errors regarding the use of with
.
When you do with IO
and then you execute a sequential operation with var <- action
, the result of that block should be of IO type.
You do this both in main
and in the unused get_dataset
.
Crash possibly the same as https://github.com/HigherOrderCO/Bend/issues/632
Reproducing the behavior
Running this script with "bend run-c" I can parse a file until with 2**15 elements with the following code, but I get the beginning of the list to print and then a seg fault when I use a 2**16 elements file. I cant see any issue in the implementation itself that could cause that behavior.
Input file example: Parses a file into a list of integers with the first line being the size of the dataset Example:
The script:
System Settings
Additional context
It might be related to this issue: https://github.com/HigherOrderCO/Bend/issues/632