DRAFT: csv export support

juliangaal commented 3 months ago

Hi there,

after #2 I took the liberty to start working on csv export support

Current changes:

csv(stream) that matches print(stream)
get_summary (formerly get_summary_table) extended to support ResultFormat: TABLE, CSV, DB, JSON
result_csv implemented to print summary
- file system operations later
- exporting the "detailed" overview requires further discussion
- currently includes info and summary table in the same csv string (default for printing currently), see comments below

Issues and Discussion

Please provide clang-format file. Formatting is tab-based and hard to adhere to, I did my best.
IMO placing units in the header makes exporting to different formats (CSV, JSON, DB) easier down the line. This way there is no need to write a function that does, e.g. table_percentage yet doesn't append "%". It also makes result file parsing much simpler.
```
filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100]
ctrack.cpp,expensiveOperation,6,10,99.95%,99.95%,576.21 ms,576.21 ms
```
would become
```
filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100]ms,time a[0-100]ms
ctrack.cpp,expensiveOperation,6,10,99.95,99.95,576.21,576.21
```
Now, when parsing, there is no need to handle units explicitely
I think splitting into an 'info.csv' and 'results.csv' is necessary, because each table/csv has its own header.

How do you want to export the "details" table in the future? That part I haven't touched at all yet. By extending the headers to, e.g. min (fastest [0-1]%), min (center[1-99]%)

+---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
|   fastest[0-1]%   |                           center[1-99]%                           | slowest[99-100]%  |
+---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
|   min   |  mean   |   min    |   mean   |   med    |  time a   |  time ae  |   max    |  mean   |   max   |
+---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
| 0.00 ns | 0.00 ns | 51.12 ms | 57.14 ms | 58.00 ms | 571.36 ms | 571.36 ms | 59.01 ms | 0.00 ns | 0.00 ns |
+---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+

Compaile commented 3 months ago

thanks, for the pr we will review it shortly.

one quick thing to add alredy. Leaving the units only in the header wont be great for terminal ussage.

times can be ns can be us can be ms or seconds. it will be hard to read if people have something like 10000000 ms in a slow code path.

but i aggree for csv it would be better so i'll think about it

Compaile commented 2 months ago

Hi @juliangaal , I like the general structure of the pr!. Here are some topics which you mentioned and my suggestions: a) For the file-based export, the detailed overview can just be extra columns. At the end of the day, the summary overview is just a short version of the details table merged. So I would just add the extra columns for the missing entries (I can also take care of that if you want). b) Regarding the units, I see your point. I still want to have the units in string (logfile) or std::cout export. So we could add that as a parameter overload. What unit should we use as default if we leave them out? ms? µs? Or is this a setting? c) Can you explain why splitting into info and result.csv would be necessary? d) I agree we should provide a clang-format file. Will add that to the todo.

juliangaal commented 2 months ago

a) sure, I would appreciate that b) My latest commit implemented just that, for summary and detail. Regarding "leaving out the units", I would only leave them out of the data, but place the unit in the header? That way the user doesn't have to chose, and future self will not have to answer the question which unit was used during generation... c) I guess this is partly personal preference, but I believe it makes parsing easier. Consider this (current csv output of get_summary):

   Start,End,time total,time ctracked,time ctracked %  <-- info header
   2024-08-13 11:53:16,2024-08-13 11:53:17,565.93,565.58,99.94  <-- info
   filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100] <-/ summary header
   ctrack.cpp,expensiveOperation,6,10,99.94,99.94,565.58,565.58 <-- summary

Processing this with e.g. pandas means discarding the first two rows just to access the other (for benchmarking more useful) data. We could of course write the data "next to" each other in the csv file

Compaile commented 2 months ago

Here's the text with spelling corrections: Perfect. a) I will add it to your PR soon. b) Yes, we can put them in the header when we don't want them inside the rows. However, when you have a big project and tracked many functions, it will be hard to set a good unit value in the header. As an example, let's say we have functions which need 50 seconds and some which need 100 μs. If we choose μs as header, the 50 seconds entry would show 50000000000, which is hard to read. In CSV, I totally agree it makes sense. What default unit should we use for CSV (since it needs to be constant) for all entries? Milliseconds? c) I prefer to just add some more rows and write it next to the other data. That's more common than having 2 CSV files. So I would just add the info entries (Start, End, time total, time tracked, time tracked %) to the start or end of the CSV lines. Could you take care of this?

Then i will add a) and merge. Thanks for the nice idea and implementation

juliangaal commented 2 months ago

regarding b) couldn't we just provide an option where the dev can pass, e.g. std::chrono::milliseconds?

In general, I am just a little confused about your goals with appending rows to the right of the csv file. E.g.

Start,End,time total,time ctracked,time ctracked %  <-- info header
2024-08-13 11:53:16,2024-08-13 11:53:17,565.93,565.58,99.94  <-- info
filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100] <-/ summary header
ctrack.cpp,expensiveOperation,6,10,99.94,99.94,565.58,565.58 <-- summary

would have to be changed to

Start,End,time total,time ctracked,time ctracked, filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100] 
2024-08-13 11:53:16,2024-08-13 11:53:17,565.93,565.58,99.94,,,,,,,,
,,,,,,,,ctrack.cpp,expensiveOperation,6,10,99.94,99.94,565.58,565.58 
,,,,,,,,ctrack.cpp,expensiveOperation2,6,10,99.94,99.94,565.58,565.58 
....

is this the format you are intending to provide?

Compaile / ctrack

DRAFT: csv export support #5

Current changes:

Issues and Discussion