andpet1324 / MNXB11-project-EVA

GNU General Public License v3.0
0 stars 2 forks source link

Overall comments on the content of the repository #41

Open floridop opened 1 week ago

floridop commented 1 week ago

This issue contains feedback about your project. In some cases it may contain actions to carry in order to get a final grade.

Workplan

The workplan is not really a workplan. There are no deadlines, no assigned tasks, and definitely too much detail about code. Rather than a workplan this is more an architectural overview of the code that you wrote as you were developing code.

Going back to the git history of it I can see at the beginning you had some sort of agreement of the main tasks.

ChangeLog

The Changelog is very good. I appreciate that you decided to use some sort of standard. I recommend if you ever do this again to also add bullet lists expecially if it's a MarkDown formatted file. Nevertheless it was very good from the start and it shows that you guys contributes as it developed.

README.md

What is the CLI library? and why did you use it? Same questions for all the other libraries. I can guess what you used them for, but it's nicer if one knows what is installing...

I followed the instructions carefully and here's the problems I encountered:

Build and install the CLI library

Actually there's no building nor installing in this section, there's just coping a header file in a folder. I think what you meant is maybe "preparation steps for building the CLI library".

This library made me a bit scared, there's a lot of code in the header that should not really be there. But who cares about standards today? And I guess they're Python users.

Build and install the CSV library

Again, there's no building and installing, just preparing files for compilation.

I have even more bad feelings about this one. It's ok that you used it, but please do not copy these bad practices. No serious code in header files, it should be just the bare minimum to identify the functions. There's a lot of talk on the internet of whether one should do it or not, but everyone says that as a project grows it is important to keep them separated.

Build and install the fmt library

Ok, here there was actually some building!

There was a bit of confusion of where to create the external/include folder. this should have been mentioned before cding int dependencies I believe.

Maybe even better att this downloaded code should probably have been donwloaded inside a tmp folder.

Anyway I decided to reuse the existing external/include and use the dependencies for the build given that you had a comment that it had to be the same as for CLI.

There were also other errors with relative paths for example here: https://github.com/andpet1324/MNXB11-project-EVA/blob/400521db00ff63c6d59845aa545164956893ce97/README.md?plain=1#L72

should have actually been

cmake ../../fmt/ -DCMAKE_INSTALL_PREFIX=../../../external/

But it's ok, I got it. Easy mistake to made.

Build and install the Date library

Also here, it was just copying the headers, no installation. Here the folder paths where a bit messed up. I guess who wrote it expected everything to be inside dependencies/ but clearly most of it got confused. Anyway I understood the intention and copied the header in the required place.

Build the project

Here everything failed, because you forgot to note that all of this only works inside the container:

[pflorido@cosmos3 TeamG]$ make
g++ -Wall -Wextra -Werror -std=c++17 -O3 -I include -I external/include/ -I `root-config --incdir` -c src/Measurement.cxx -o src/Measurement.o -Wno-stringop-truncation
/bin/sh: line 1: root-config: command not found
/usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64/crt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [Makefile:26: src/Measurement.o] Error 1

I think in the preparation is important to know in which environment this should have been ran. I built everything outside the container and it worked also inside...

How to run the project

I boldly did

Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG]> ./main -i=smhi-opendata_1_72450_20231007_155620_Boras.csv -a=1
The datafile is smhi-opendata_1_72450_20231007_155620_Boras.csv
The file name: baredata_smhi-opendata_1_72450_20231007_155620_Boras.csv
terminate called after throwing an instance of 'io::error::can_not_open_file'
  what():  Can not open file "baredata_smhi-opendata_1_72450_20231007_155620_Boras.csv" because "No such file or directory".
Aborted (core dumped)

Right. It was just an example. Let's look at the dataset folder.

Wait what? there is no dataset? I do not really understand why you did this, it was ok to keep at least the dataset that you used for your project report. We actually provided the datasets in the project template, you could have kept/used those ones. Now the README.md doesn't match the content of the folder.

Now I have no idea what to download to test. Looking at your report, I took Falsterbo and placed it in datasets/

Second and third attempt:

Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG]> ./main -i=datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv -a=1
The datafile is datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
The file name: baredata_datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
terminate called after throwing an instance of 'io::error::can_not_open_file'
  what():  Can not open file "baredata_datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv" because "No such file or directory".
Aborted (core dumped)
Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG]> ./main -i=./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv -a=1
The datafile is ./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
The file name: baredata_./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
terminate called after throwing an instance of 'io::error::can_not_open_file'
  what():  Can not open file "baredata_./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv" because "No such file or directory".
Aborted (core dumped)

Mhh. Maybe this library doesn't like paths?

Or maybe there should have been some data cleaning step done that is missing here in this description? I see your code looks for some "baredata" filename.

Unfortunately this just doesn't work. See actions.

Here's what I had to do to make your code work:

Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG/datasets]> ./smhicleaner.sh smhi-opendata_1_52230_20241113_173325_Falsterbo.csv 
Redirecting cleaner logs to 2024-11-13_smhicleaner.sh.log
[2024-11-13T18:39:09+01:00 Cleaner]: Copying input file smhi-opendata_1_52230_20241113_173325_Falsterbo.csv to original_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
[2024-11-13T18:39:09+01:00 Cleaner]: Finding the first line containing 'Datum'...
[2024-11-13T18:39:09+01:00 Cleaner]: Found line 10
[2024-11-13T18:39:09+01:00 Cleaner]: Perform cleanup and give the output directly
Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG/datasets]> ls
2024-11-13_smhicleaner.sh.log  baredata_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv  smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
README.md              original_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv  smhicleaner.sh
Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG/datasets]> cp baredata_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv ../
Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG/datasets]> cd ..
Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG]> ./main -i=./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv -a=1
The datafile is ./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
The file name: baredata_./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
terminate called after throwing an instance of 'io::error::can_not_open_file'
  what():  Can not open file "baredata_./datasets/smhi-opendata_1_52230_20241113_173325_Falsterbo.csv" because "No such file or directory".
Aborted (core dumped)
Apptainer[pflorido@cosmos3 ~/git/MNXB11/TeamG]> ./main -i=smhi-opendata_1_52230_20241113_173325_Falsterbo.csv -a=1
The datafile is smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
The file name: baredata_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
Info in <TCanvas::Print>: png file mean_temperature_over_a_year.png has been created
corrupted size vs. prev_size
Aborted (core dumped)

One very bad thing here is that the baredata_ prefix is hardcoded in C++ and there is no warning anywhere about this being an issue. This is usually very bad, please try not to hide many details to the user.

In practice, your code takes a filename as input, but actually looks for another one. Very confusing!!

The results matched the plots in the report.

BASH Code

The only BASH code I have seen is the one provided during the course, and I see that you modified it to your needs.

Just what is completely missing is what you used it for, I had to guess while reviewing the project code. This is not good, I would like to have a clarification of how you used it and at least one example dataset that I can use to test.

See actions.

C++ Code

The overall code structure is very nice, I appreciated that you created a measurement class that was then used by different group members for different analysis.

I was a bit surprised by the "infinite measurement" check in https://github.com/andpet1324/MNXB11-project-EVA/blob/400521db00ff63c6d59845aa545164956893ce97/src/Measurement.cxx#L17 Is that something that happened to you?

Looking at DataExtraction.cxx I realized that there was actually a logical error and that's why my attempts didn't work; lots of paths are hardcoded, the BASH script for cleaning is not ran.

https://github.com/andpet1324/MNXB11-project-EVA/blob/400521db00ff63c6d59845aa545164956893ce97/src/DataExtraction.cxx#L6-L20

A typical error:

Apptainer[pflorido@cosmos-pkf ~/git/MNXB11/TeamG]> ./main -i=smhi-opendata_1_52230_20241113_173325_Falsterbo.csv -a=3
The datafile is smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
The file name: baredata_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv
terminate called after throwing an instance of 'io::error::can_not_open_file'
  what():  Can not open file "baredata_smhi-opendata_1_52230_20241113_173325_Falsterbo.csv" because "No such file or directory".
Aborted (core dumped)
Apptainer[pflorido@cosmos-pkf ~/git/MNXB11/TeamG]> ls -l smhi-opendata_1_52230_20241113_173325_Falsterbo.csv 
-rw-r--r-- 1 pflorido hep 6660322 Nov 14 15:21 smhi-opendata_1_52230_20241113_173325_Falsterbo.csv

the file is there, but the hardcoded script with hardcoded paths cannot run and the generated baredata file does not exist. The only thing that worked was to manually clean and copy the baredata file.

Anyway please do not ever call BASH from inside C++. It's ok to call a C++ binary from BASH but the opposite is pure evil.

Also, avoid putting hardcoded paths anywhere, it will just confuse you (and me)

ROOT

I have no special comments on the ROOT part of the code, but I like that you used it in C++ and not as a .C script.

Summary

Overall a good project, with good C++ code, and good use of a shared object for data management.

The only problem I see is that you got lost in paths and calling external scripts. Do not call scripts from C++, you want your compiled code not to depend on external scripts to run. It would have been better to do the opposite, that is, to create a BASH script to automate dependencies installation and execution of your main.

The most curious thing in your project is how many times and places you misspelled the word "AUTHORS" ;)

Action(s) to take

A.1. Provide the exact data files that you used to create the plots in the report. Upload them in the dataset folder.

A.2. Provide a working example of how to run your code. Do not omit additional steps required for the code to work.

A.3. Review the hardcoded paths many in DataExtraction.cxx. It is not OK to have '~username/git' hardcoded somewhere, even as default.

andpet1324 commented 1 week ago

Hello!

Thank you for the feedback. Is it a must that we need to fix the actions for our grades? We are 2 weeks into new courses with heavy workload. We were not foretold about having to fix our code after the course has ended so we could not have planned for it beforehand.

Regards, Andreas

floridop commented 1 week ago

I understand. I think this is a decision for @0xana and should be eventually discussed on Canvas. I also understand that this is mainly a confusion with existing paths on the cluster and not a major issue with your code -- I can see in the code how you made it work. It just won't work for anybody else without editing and recompiling I believe :)