JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.6k stars 5.48k forks source link

randomize profile interval? #9224

Open StefanKarpinski opened 9 years ago

StefanKarpinski commented 9 years ago

It's plausible that using a uniform profiling interval could produce biased samples since the interval might be in lock step with what the code is doing. Should we using random intervals for the profiler to avoid this possibility?

johnmyleswhite commented 9 years ago

Maybe you want exponentially distributed samples to produce "uniformity" in time?

StefanKarpinski commented 9 years ago

Possible, although I'm not sure that we necessarily want a Poisson process. Just avoiding the possibility of lockstep may be adequate. This may not even be a realistic problem. I'm not sure.

timholy commented 9 years ago

I wondered about this, but decided to wait for someone to come up with a test case that demonstrates a problem. So far it hasn't happened, but it's probably fair to say that most people take profiling results at face value.

There is a periodic event that does dramatically influence profile results: garbage collection. But that's a completely separate issue, and would not be affected by randomizing the interval.

mschauer commented 9 years ago

If it turns out to be really necessary, thinking about it as an antialiazing problem suggest sampling at a uniform time point in the fixed size interval .

StefanKarpinski commented 9 years ago

That is kind of a good way to look at it.

batesl87 commented 7 years ago

The regular intervals of the profiler are starting to become an issue for me. I'm building a large graph structure and storing it in a custom memory-mapped array (it can reach a billion edges). The graph is very regular and as a result the build process is regular too.

The profiler results can be misleading. (I added a random pause within a function and it significantly changed the results)

Is there a way to randomise the profiling interval from within Julia?

timholy commented 7 years ago

Assuming you mean make each interval different from the other, currently no there isn't. If you're not up for modifying the C code yourself, the next best contribution would be to post here with a bit more quantitative evidence for the scale of the problem. (Preferably with code that others can run, if that's feasible.) Fixing this is not a simple thing and will likely have performance costs that affect all users of the profiler. For whoever does fix this, it would be helpful if the trade offs could be made in way that was based on numbers rather than suspicions and hearsay. (This is not a slight against your comment, it's the first concrete report so far! Simply that more detail would be oh-so-useful.)

batesl87 commented 7 years ago

Thanks Tim, I appreciate that this is odd that I'm the first to report it. I can't upload the code that has produced the issue, but I'm trying to come up with a simpler case study.

I've looked through the c code for the profiler on Windows and found what I think would be a relatively simple fix. It looks like the delay in the profiler is implemented through Sleep(timeout) on line 340 of signals-win.c. Would it be sensible to apply a random scaling factor (at most 50% either side) to the timeout value?

timholy commented 7 years ago

That seems like a sensible approach. I'm a little more concerned about the performance impact for Unix, though; the profiler samples much more frequently on those platforms.

But looking a little more deeply, it seems that random number generation won't be a big bottleneck:

julia> @time rand(1000);
  0.000024 seconds (6 allocations: 8.109 KB)

So most likely my fears about a performance hit are unfounded.

EDIT: In other words, if you can demonstrate that your change hasn't made anything worse, then I can't imagine any basis for complaining.

batesl87 commented 7 years ago

I have made some changes to profile_bt that I believe should improve the profiler for Windows.

The code utilises QueryPerformanceCounters for the timer and Sleep(0) to minimise resource consumption. I believe this approach should work well for the profiler as a uniform interval is not a desired trait.

https://github.com/batesl87/julia/commit/30e5c4b6e24828cbc437f5141568602080dc725e#diff-eb318045d5a3ea7069b5338c19e9b00f

timholy commented 7 years ago

That link doesn't work for me (I get a 404 error).

batesl87 commented 7 years ago

Not sure what's happening there. Here is the updated function.

static DWORD WINAPI profile_bt( LPVOID lparam )
{
    // Note: illegal to use jl_* functions from this thread
    TIMECAPS tc;
    if (MMSYSERR_NOERROR!=timeGetDevCaps(&tc, sizeof(tc))) {
        fputs("failed to get timer resolution",stderr);
        hBtThread = 0;
        return 0;
    }
    **//Precision timer
    LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
    LARGE_INTEGER Frequency;
    QueryPerformanceFrequency(&Frequency);**

    while (1) {
        if (running && bt_size_cur < bt_size_max) {
            **//calculate start time for high precision sleep
            QueryPerformanceCounter(&StartingTime);
            //multiply timeout(microseconds) by random factor between 0 and 2
            DWORD timeout = (int)((nsecprof*((double)rand() / (double)RAND_MAX*2.0)/1000));
            //if timeout is greater than a millisecond use Sleep
            if (timeout < 1000)
            {
                int time_elapsed = 0;
                while (time_elapsed ==0)
                {   
                    QueryPerformanceCounter(&EndingTime);

                    ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
                    ElapsedMicroseconds.QuadPart *= 1000000;
                    ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;

                    if (ElapsedMicroseconds.QuadPart >= timeout)
                    {
                        time_elapsed = 1;
                    }
                    else
                    {
                        //if time has not elasped offer up processor to another thread
                        Sleep(0);
                    }
                }
            }
            else
            {
                Sleep((int)(timeout/1000));
            }**

            if ((DWORD)-1 == SuspendThread(hMainThread)) {
                fputs("failed to suspend main thread. aborting profiling.",stderr);
                break;
            }
            CONTEXT ctxThread;
            memset(&ctxThread, 0, sizeof(CONTEXT));
            ctxThread.ContextFlags = CONTEXT_CONTROL | CONTEXT_INTEGER;
            if (!GetThreadContext(hMainThread, &ctxThread)) {
                fputs("failed to get context from main thread. aborting profiling.",stderr);
                break;
            }
            // Get backtrace data
            bt_size_cur += rec_backtrace_ctx((uintptr_t*)bt_data_prof + bt_size_cur,
                bt_size_max - bt_size_cur - 1, &ctxThread);
            // Mark the end of this block with 0
            bt_data_prof[bt_size_cur] = 0;
            bt_size_cur++;
            if ((DWORD)-1 == ResumeThread(hMainThread)) {
                fputs("failed to resume main thread! aborting.",stderr);
                gc_debug_critical_error();
                abort();
            }
        }
        else {
            SuspendThread(GetCurrentThread());
        }
    }
    hBtThread = 0;
    return 0;
}
timholy commented 7 years ago

Any chance you could turn that into a pull request? It might not be merged until someone does the same for Unix, but this would be a great start.

kmsquire commented 7 years ago

(The link on github seems to work now.)

StefanKarpinski commented 4 years ago

This seems to be a non-issue at this point? Or is this still relevant? @timholy, feel free to close.

vtjnash commented 4 years ago

It's still relevant—worse now actually since we also round-robin sample each of the threads in the same order.

NHDaly commented 3 years ago

Per @tveldhui's comment here: https://github.com/JuliaLang/julia/issues/33490#issuecomment-542156613

so the process has time to return to the stationary distribution (sampling interval >> mixing time).

How do you know what the "mixing time" is for a process? I'm not familiar with this term.

@vtjnash: I've already forgotten why we thought it made more sense to permute the order of the round-robin sampling of the threads, instead of switching to only sampling one thread at a time. Can you remind me? Was it basically just that it seemed like a smaller change, so maybe an easier first-step?

vtjnash commented 3 years ago

For single-threaded profiling, it means you get 1/n the number of samples for the thread of interest.