biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.85k stars 1.01k forks source link

Reconstruct .pyx files #6417

Closed yangsia closed 1 year ago

yangsia commented 1 year ago

What's your use case?

Installing Orange3 with source code is suffering from C++ compiler predicament.

How about reconstruct .pyx files of Orange to make it 100% pythonic? I have knocked down 8 out 9 pyx files. The last boss is _io.pyx which only be used in BasketReader class. Function "sparse_read_float" of _io.pyx is tedious and importc "libc.stdio" and "libc.string". If Orange3 is 100% pythonic, it will be easily used in cross-platform. Cython is out of date. New python running is also very efficient, where the speed advantage of c++ code no longer exists.

What's your proposed solution?

  1. Replace all 9 .pyx file with .py code.
  2. Test source code.
  3. Change the setup process to skip c++ compiler process.
  4. 100% pythonic achieve.

I have knocked down 8 out 9 pyx files. The last boss is _io.pyx which only be used in BasketReader class. Function "sparse_read_float" of _io.pyx is tedious, and imported "libc.stdio" "libc.string" library are too base to rewrite. If anyone familiar with _io.pyx or basket data could help, I would provide my code to cooperate.

Are there any alternative solutions?

markotoplak commented 1 year ago

We'd also prefer to drop the C code. Could you let me know what you used instead for speed-critical parts? Numba? Something else? Just leaving it to Python and hoping for the best?

yangsia commented 1 year ago

We'd also prefer to drop the C code. Could you let me know what you used instead for speed-critical parts? Numba? Something else? Just leaving it to Python and hoping for the best?

I just use numpy and heapq. I run some example workflow and it goes fine.

markotoplak commented 1 year ago

I imagine that all example workflows should work fine because they are particularly non-demanding. Did you perhaps also benchmark the specific parts the C code was used for?

We specifically chose some parts for implementation in C because we could not make the python versions run fast enough (on bigger datasets). But yes, some decisions might be out-of-date, and perhaps having a pure python fallback implementation if there are no C compilers would certainly make sense.

janezd commented 1 year ago

@yangsia, any progress? How is the pure Python code doing speed-wise - what do benchmarks show?

yangsia commented 1 year ago
    Hi,
    Long time no see. We are occupied by other projects, and the test benchmark has not begun yet.Is there any well-acknowledged benchmark you are interesting? If we switch back to this problem, these test can be helpful.Best regards.

                    Tianji YangAssociate ResearcherBig Data GroupShenyang Institute of AutomationChinese Academy of Sciences

---- Replied Message ----

     From 

        Janez ***@***.***>

     Date 

    5/26/2023 03:20

     To 

        ***@***.***>

     Cc 

        ***@***.***>
        ,

        ***@***.***>

     Subject 

          Re: [biolab/orange3] Reconstruct .pyx files (Issue #6417)

@yangsia, any progress? How is the pure Python code doing speed-wise - what do benchmarks show?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

janezd commented 1 year ago

We only write benchmarks when some new code is expected to affect performance (for better or worse). Following this policy, there are no benchmarks for the code you are changing, but replacing this code would require providing such benchmarks to show that there is no considerable degradation in speed.

This is based on real concern. Cython is mostly used in code that we couldn't implement using vectorized operations in numpy. I would expect that just replacing loops in Cython with equivalent code in Python, without any further tricks, would slow it down by a factor of 10-50. That's why this needs to be benchmarked to show this is not the case.

janezd commented 1 year ago

Closed due to inactivity.