Add uniq command - Githubissues

katherineanthony commented 4 months ago

Vinnie and I completed the implementation for uniq as follows:

flag of -c prints the count of the unique lines
flag of -w prints the unique words
flag of -wc prints the unique words and their counts

aajoseph2 commented 4 months ago

HI, I can code review this one!

aajoseph2 commented 4 months ago

Proposed Enhancements

Case Sensitivity:
    Implement functionality to treat words with different cases as the same (e.g., "Blah" and "blah" should be considered identical, with only "blah" being outputted).

Ignoring Trailing and Leading Spaces:
    Add functionality to ignore leading and trailing spaces (e.g., " blah" and "blah" should be treated as the same, with only "blah" being outputted).

Test Input and Observations

Using the input below with the -w flag, a user trap error is encountered, suggesting the program does not correctly parse blank spaces:
    blah blah blah
    test
    test
    test
    test
    Blah
           blah

Inefficient and Potentially Problematic Sorting Algorithm

The use of bubble sort may be inefficient for large datasets, and its implementation does not account for potential issues with duplicate or very similar strings, which could lead to an incorrect sort order in edge cases.

Identified Issues

Memory Management:
    In the lines() and words() functions, memory is allocated but not freed, potentially leading to memory leaks.

Error Handling in File Operations:
    There is no explicit error handling after file operations like open or close. Failure to open files or read/write errors could lead to undefined behavior or crashes.

Logic in print_uniq():
    The comparison logic assumes the existence of lines[curr + 1] without verifying that curr + 1 < count. This could lead to accessing out-of-bounds memory, resulting in undefined behavior or crashes.

Increased Size of File Input Text:
    "I cannot tell if it is just my machine, but when I double the lines of README.md, trying to read more lines or words, the program seems to enter an infinite loop and never finds the EOF. Also, when adding spaces to certain lines and words, execution tends to lead to undefined behavior."

Example Test Input That

test file
test
testing this file
test file
test file
blah blah blah
blah
hello
joe
bat
okay
what
are
we
doing
here
again

Overall

"I had a lot of fun playing with the code and program. Thanks for allowing me to peer-review the code. Also, please let me know if I misinterpreted or misjudged something from the program's implementation."

AndrewLiu666 commented 3 months ago

Hi, Hana and I ( Andrew ) will code review this project!

hanag1234 commented 3 months ago

Hi, great job on your project! A few suggestions we (Andrew and Hana) would suggest are:

Memory Management: In the lines function, memory allocated with malloc for line is immediately freed after being added to the lines array. Instead, I suggest you free the memory only after you're done using the allocated strings, which should be after print_uniq has processed them.
Efficiency Concerns: The use of bubble sort for sorting lines or words is not the most efficient choice, especially for larger datasets. Consider implementing or using a more efficient sorting algorithm, like quicksort or mergesort, to improve performance.
Logic Error: The loop in print_uniq does not correctly handle the last unique word or line since it relies on comparing the current item with the next one, which doesn't work for the last element.
Incorrect Continuation in Loop in print_uniq: The else if condition checking for "\0" and "\n" uses continue without incrementing curr, leading to an infinite loop.

Overall, I enjoyed using the command and thought you did a great job.

katherineanthony commented 2 months ago

@malensek Vincent and I have completed these code corrections. Thank you!

USF-OS / FogOS

Add uniq command #27