TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

threaded terminal output #281

Open yakra opened 4 years ago

yakra commented 4 years ago

Different threads write to the terminal concurrently, resulting in output like MD (2805,3438) (2516,3149) (2516,3149) MP (190,237MOZ ) ((155,202) (1015155,1016) (,411202) ,412) (411,412) MTQ (339,459) (317,437) (317,437)

Solution: Use + instead of << operator

jteresco commented 4 years ago

With +, are the writes atomic enough to avoid the overlap completely, or will it just improve the chances of avoid bad interleavings? If we care a lot, all terminal output could be funneled back to the original thread for output.

yakra commented 4 years ago

Atomic enough to avoid the overlap completely, I believe. I've switched to + before to solve this type problem.

+ in this context is the std::string concatenation operator, so we first construct a std::string and then insert the whole thing into std::cout. Contrast multiple << operators inserting multiple C strings / char arrays or std::strings in multiple operations.

yakra commented 4 years ago

If we care a lot, all terminal output could be funneled back to the original thread for output.

This may be what we need to get Augmenting travelers for detected concurrent segments to scale beyond ~3-5 threads.

yakra commented 4 years ago

Experimentation for https://github.com/yakra/DataProcessing/issues/117 revealed this may not be as atomic as I thought. Sure, that uses an ofstream and this uses cout, but they're both forms of ostream to my understanding, and ought to behave similarly. I might give it a try for giggles; if results are still no good then we can use a mutex. The performance hit should be pretty close to 0.

yakra commented 3 years ago
# find offending lines
cd TravelMapping/devel/cpuscale/sulogs/2021-01-06/

for file in *.log; do
  start=`grep -n 'continent graphs' $file | cut -f1 -d:`
  end=`grep -n 'Marking' $file | cut -f1 -d:`
  len=`expr $end - $start`
  count=`tail -n +$start $file | head -n $len | sed 's~([0-9]\+,[0-9]\+) ([0-9]\+,[0-9]\+) ([0-9]\+,[0-9]\+) ~~g' | grep -v [Ww]riting | grep -c '[0-9]'`
  if [ $count != 1 ]; then
    echo -e "$count\t"$file
  fi
done

318 examples in 4000 logs (lab1 & lab1.5) in sulogs/2021-01-06/

yakra commented 3 years ago

Heh.

https://github.com/TravelMapping/DataProcessing/blob/4879245fdfa727e57f90cd33b110a7ddd14e9bd6/siteupdate/cplusplus/classes/GraphGeneration/HighwayGraph.cpp#L512-L515 Turns out I was already using the + instead of << operator from the get-go.

How did I not look immediately below that? https://github.com/TravelMapping/DataProcessing/blob/4879245fdfa727e57f90cd33b110a7ddd14e9bd6/siteupdate/cplusplus/classes/GraphGeneration/HighwayGraph.cpp#L516-L519 is what's getting messed up.

@jteresco wrote:

With +, are the writes atomic enough to avoid the overlap completely, or will it just improve the chances of avoid bad interleavings? If we care a lot, all terminal output could be funneled back to the original thread for output.

+ only improves the chances of avoiding bad interleavings. Compared with yakra#117, there should be much less chance of them happening though.

That said, I'd like to try the + solution rather than a mutex. It's cleaner and more maintainable, changing 3 lines in one file rather than creating a new variable & passing it around to 16 different files.

Nope.

Needs a mutex.

yakra commented 3 years ago

Dead branch: term_mtx ed53a0162bf7f917827a518bff36805b32e3ba6f: cherrypick onto a9bf1588a285e9737c3421af510a5565eb9a854f

yakra commented 3 years ago