[FEATURE] Add execution statistics to Apalache

lemmy commented 4 years ago

@konnov asked during the TLA+ Community Event about worldwide TLA+ user numbers. TLC happens to collect execution statistics that give us a rough approximate. Does Apalache want to integrate it execution statistics collection as well? I'm happy to share all the technical details if sending statistics to INRIA is not a no-go. What's collected (for TLC) is listed at https://github.com/tlaplus/tlaplus/blob/master/tlatools/org.lamport.tlatools/src/util/ExecutionStatisticsCollector.md and the data is publicly shared.

konnov commented 4 years ago

It would not be a problem for us to ping the INRIA server. However, the users might get worried that we are spying on them. Would it be more reasonable to have a joint announcements list, so people would be asked to subscribe to it when they use TLC, Apalache, the VScode extension, etc?

lemmy commented 4 years ago

The Toolbox prompts users to explicitly opt-in: opt67240009-618aa200-f405-11e9-840e-05e165b92dd0

konnov commented 4 years ago

Right. That makes sense in the Toolbox. We probably can add an opt-in option in the command-line. But I doubt anybody would trigger it :-)

lemmy commented 4 years ago

Prompting users to opt-in is the task of IDEs such as the Toolbox, VSCode, .... Like TLC, Apalache would simply report stats iff ~/.tlaplus/esc.txt exists and has an opt-in marker:

https://github.com/tlaplus/tlaplus/blob/master/tlatools/org.lamport.tlatools/src/util/ExecutionStatisticsCollector.java

konnov commented 4 years ago

How about the following. We isolate all the statistics collection in a well-defined script called bin/opt-in-report-stat. The model checker script bin/apalache-mc warns the user about the option of turning on the statistics collection. We add two commands to apalache-mc, e.g., --feedback-allow, --feedback-shutup. The command --feedback-allow writes REPORT_FEEDBACK=1 in ~/.apalache/feedback.rc, whereas the command --feedback-shutup write REPORT_FEEDBACK=0 in ~/.apalache/feedback.rc. The script bin/opt-in-report-stat uses this configuration file.

@shonfeder , what is your opinion?

konnov commented 4 years ago

We will also have to add a page explaining what we collect and how we do that. The large part is already written by @lemmy: https://github.com/tlaplus/tlaplus/blob/master/tlatools/org.lamport.tlatools/src/util/ExecutionStatisticsCollector.md

shonfeder commented 4 years ago

The approach sounds OK to me, but I don't know anything about this stuff and I'm not sure how the script will work to monitor the stats we want to collect. I've never implemented or facilitated any end-user telemetry before.

Wrapping stuff in additional scripts doesn't seem ideal to me, tho I get that this is for the sake of isolating the ~spying~ telemetry code.

Having a dedicated file for setting one flag seems non-ideal. Should we just have a more general configuration file with this as our first option?

edited to correcter terminology

lemmy commented 4 years ago

What is the reason to re-write execution statistics/telemetry data collection? I created the issue, assuming that you would re-use the existing ExecutionStatisticsCollector and we change the wording in the Toolboxes (Eclipse & VSCode) to include Apalache and TLC. If you develop your own telemetry implementation, it is probably best to also separate the backend.

Btw. I very much value my privacy, but I think the term "spying" shouldn't be used in this context. It is telemetry data, users have to explicitly opt-in, and the collected data is publicly shared--quite different to how spies operate.

konnov commented 4 years ago

I don't think we have to invent our own telemetry. Just call yours in a different instance of java inside the script.

lemmy commented 4 years ago

IMO this leads to an infrastructure question: Do you expect the Toolboxes to launch Apalache via the apalache-mc script in the long run?

konnov commented 4 years ago

I don't think so. We have the class called Tool that can be called from JVM. However, Toolbox gathers the statistics itself, right?

lemmy commented 4 years ago

No, the Toolbox (Eclipse & VScode) simply creates the file ~/.tlaplus/esc.txt and leave the actual reporting to the tool. The reason being that the Toolbox doesn't have access to all of the telemetry data.

konnov commented 3 years ago

I will check, whether we can quickly plug-in telemetry and add the opt-in option.

konnov commented 3 years ago

See #415

konnov commented 3 years ago

@lemmy the opt-in message in Toolbox is talking about TLC. Could you also add a line that this opt-in works for Apalache (and probably other TLA+ tools) too? Otherwise, it may be seen as an unexpected behavior.

lemmy commented 3 years ago

It's perhaps best to replace TLC with TLA+ Tools.

lemmy commented 3 years ago

Are all other items of the list going to apply?

Total number of cores and cores assigned to TLC Heap and off-heap memory allocated to TLC TLC version (git commit SHA) If breadth-first search, depth-first search or simulation mode is active TLC implemenation for the sets of seen and unseen states If TLC has been launched from an IDE Name, version, and architecture of the OS Vendor, version, and architecture of JVM The current date and time An installation ID which allows to group execution statistic (optionally)

konnov commented 3 years ago

So far, here is what I managed to report:

Total number of cores and cores assigned (the latter is 1 for now, but will change soon)
Heap memory (I am not sure what to do about the off-heap memory)
Version (semantic version + build)
mode: parse, check, typecheck
do you want to know, if Apalache is run in an IDE?
Name, version, and architecture of the OS
Vendor, version, and architecture of JVM
Timestamp
An installation ID which allows to group execution statistic (optionally)

We only enable/disable statistics in the apalache command line. There is no choice between ID and no ID. But if the ID is disabled in Toolbox, your code will respect it.

konnov commented 3 years ago

This one is done.

lemmy commented 3 years ago

So far, I don't see Apalache stats in the data.

konnov commented 3 years ago

There should be a few entries when I tried to run it on my computer. Apart from that, it is not released yet.

konnov commented 3 years ago

It is showing up here: https://exec-stats.tlapl.us/

Can it be a parsing problem? E.g., our version string is longer than that of TLC.

lemmy commented 3 years ago

To keep the backend simple, can you please also report:

result.put("osVersion", System.getProperty("os.version"));
result.put("jvmOffHeapMem", "0");

konnov commented 3 years ago

Added. Just have to wait until PR is built.

konnov commented 3 years ago

There is at least one entry at exec-stats now and I see the apalache entry!

konnov commented 3 years ago

Closing the issue.

lemmy commented 3 years ago

do you want to know, if Apalache is run in an IDE?

I and others have been setting ide to e.g. "github" when TLC runs as part of automation.

konnov commented 3 years ago

Why would you collect stats for the github runs?

lemmy commented 3 years ago

Why not? It tells if users run our tools as part of automation and we can easily exclude the datapoints via ide in other queries. For example, I wouldn't be surprised if automation correlates with beefier machines.

apalache-mc / apalache

[FEATURE] Add execution statistics to Apalache #288