Stewori / JyNI

Enables Jython to load native CPython extensions.
https://jyni.12hp.de
Other
152 stars 17 forks source link

PyFile support #11

Open whilo opened 7 years ago

whilo commented 7 years ago

I have explored supporting matplotlib font loading, which require PyFile support.

I have commented in

FILE *
PyFile_AsFile(PyObject *f)
{
    printf("Accessing file.\n");
    if (f == NULL || !PyFile_Check(f))
        return NULL;
    else
        return ((PyFileObject *)f)->f_fp;
}

Which is called and immediately SIG_ABORTS. I get a dump file from the JVM:

Register to memory mapping:

RAX=0x00007fb7cbdd8280: PyFile_Type+0 in /home/christian/Development/JyNI/build/libJyNI.so at 0x00007fb7cbaa9000
RBX=0x00007fb7f8070958 is an unknown value
RCX=0x00007fb7c055d390 is an unknown value
RDX=0x0000000000000001 is an unknown value
RSP=0x00007fb8278190e8 is pointing into the stack for thread: 0x00007fb82000a000
RBP=0x00007fb7f8032e30 is an unknown value
RSI=0x00007fb7c055d390 is an unknown value
RDI=0x00007fb7f8032e30 is an unknown value
R8 =0x0000000000000000 is an unknown value
R9 =0x0000000000000004 is an unknown value
R10=0x0000000000000319 is an unknown value
R11=0x00007fb7cbb08b65: PyErr_SetString+0 in /home/christian/Development/JyNI/build/libJyNI.so at 0x00007fb7cbaa9000
R12=0x0000000000000000 is an unknown value
R13=0x0000000000000000 is an unknown value
R14=0x00007fb827819130 is pointing into the stack for thread: 0x00007fb82000a000
R15=0x0000000000000001 is an unknown value

Stack: [0x00007fb82771f000,0x00007fb827820000],  sp=0x00007fb8278190e8,  free space=1000k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 3935  JyNI.JyNI.callPyCPeer(JLorg/python/core/PyObject;Lorg/python/core/PyObject;J)Lorg/python/core/PyObject; (0 bytes) @ 0x00007fb81183612c [0x00007fb8118360c0+0x6c]
j  JyNI.PyCPeerType.__call__([Lorg/python/core/PyObject;[Ljava/lang/String;)Lorg/python/core/PyObject;+36
J 2911 C1 org.python.core.PyObject.__call__(Lorg/python/core/PyObject;)Lorg/python/core/PyObject; (16 bytes) @ 0x00007fb811ae0fec [0x00007fb811ae0e80+0x16c]
J 5454 C2 org.python.core.PyObject.__call__(Lorg/python/core/ThreadState;Lorg/python/core/PyObject;)Lorg/python/core/PyObject; (6 bytes) @ 0x00007fb8125b22e0 [0x00007fb8125b22a0+0x40]
j  matplotlib.font_manager$py.createFontList$16(Lorg/python/core/PyFrame;Lorg/python/core/ThreadState;)Lorg/python/core/PyObject;+673
j  matplotlib.font_manager$py.call_function(ILorg/python/core/PyFrame;Lorg/python/core/ThreadState;)Lorg/python/core/PyObject;+368

Just returning NULL is not sufficient as a quick hack and I guess that I need to wrap a Jython class for PyFile instead of accessing the pointer directly. You said it would be easy to fix, but in the comment you say:

//Providing the following method via JyNI would be very hard or impossible.
//However, we could create a FILE-Pointer to the same file and also adjust
//io mode and seek-position to the values of the given PyFile.
//But this may lead to conflicts in systems not supporting multiple file
//access (i.e. windows) and would not stand equality-checks via pointer-wise "==".

I had some frustrations with filesystem APIs lately for my kv-store implementation, so I would also be careful with double access. What are your suggestions to proceed? Also do you have some chat channel? This might be easier to get started.

Stewori commented 7 years ago

First of all, note that PyFile_WriteString is already implemented and can serve as a pattern. I remember I once decided it would be best to keep PyFile-stuff on Jython-side and call into that stuff on C-side. Such decisions are mostly made by carefully reviewing the header file. Especially if there are no macros accessing an object's internals (other than type, length, size and refcounter), the best way is to implement a Jython-centric solution like shown in PyFile_WriteString.

That said, look at https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JyNI_JNI.h#L508 and https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c#L449 and https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c#L1100. At these three spots you must declare functions in Jython for C-access. https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JNI_util.h declares a bunch of magic and macros to simplify JNI-based declarations. Originally it was even more painful. See the macros https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JNI_util.h#L133 and below. These are meant for public access: JNI_METH_CLASS(classID, name, ret, ...) JNI_METH_INTERFACE(classID, name, ret, ...) JNI_METH_CLASS2(classID, name, cName, ret, ...) JNI_METH_STATIC(classID, name, ret, ...) JNI_METH_STATIC2(classID, name, cName, ret, ...) JNI_FIELD(classID, name, tp) JNI_FIELD_STATIC(classID, name, tp) JNI_CONSTRUCTOR(classID, cName, ...) JNI_SINGLETON(classID, name, tp) JNI_SINGLETON2(classID, name, cName, tp)

Yes, I have to document these (please help with that if you can!) Look at https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c for plenty usage examples. Feel free to ask further questions if you struggle with using these macros.

Note that JNI_METH(classID, name, ret) and JNI_METH_1(classID, name, ret, arg1, tp1) at the very bottom are still unfinished/experimental and not used by JyNI. The plan is that they would also add convenient calling-code rather than just the declaration.

Stewori commented 7 years ago

In http://www.kfu.com/%7Ensayer/Java/jni-filedesc.html we can see that java.io.FileDescriptor has a private int-field fd. I hope we can access that to implement PyFile_AsFile. A Jython PyFile uses org.python.core.RawIO as backend. Most likely this would be a FileIO, which contains a RandomAccessFile, which in turn contains a java.io.FileDescriptor. This will involve some struggling with private API and is thus best done directly from JNI (we might have to revisit the solution for Java 9). It might be a bit tricky to get this right; good luck!

Stewori commented 7 years ago

Functions that don't access internals of the PyFileObject struct, e.g. PyObject_AsFileDescriptor can be commented in without a change. If they call other functions, we should focus on implementing these.

whilo commented 7 years ago

Ok, I have followed your advice. I have first exported the filedescriptor, which was already accessible in the Jython API (through the usual reflection hack in FileIO.__int__():

https://github.com/whilo/jython/commit/68ed56a90c945984d6005e755e6e0446ca57cc01

This was much easier than unwrapping the nested objects through the C API for me, but this is obviously only a hack to see how far I can get. I then added the necessary method declarations as you suggested:

https://github.com/whilo/JyNI/commit/c51c27754f80c0b807e1640b6c1c3a6b0f168383

I can import matplotlib.pyplot now (pylab still crashes):

https://github.com/whilo/JyNI/commit/c51c27754f80c0b807e1640b6c1c3a6b0f168383#diff-753c9fd4810cb43fc681a2eae65157cc

I can create a plot object, but after the tkinter window pops up the process immediately crashes.

christian@lacan ~/D/JyNI> java -cp jython.jar:build/JyNI.jar org.python.util.jython 
Jython 2.7.1rc3 (, Jun 19 2017, 12:45:06) 
[OpenJDK 64-Bit Server VM (Oracle Corporation)] on java1.8.0_131
Type "help", "copyright", "credits" or "license" for more information.
>>> import setup_path
>>> import matplotlib.pyplot as plt
>>> plt.plot([1,2,3])
[<matplotlib.lines.Line2D object at 0x2a3>]
>>> plt.show()
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f93d1a0bdc8, pid=16675, tid=0x00007f93fa375700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-0ubuntu1.17.04.1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [multiarray.x86_64-linux-gnu.so+0x5adc8]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/christian/Development/JyNI/hs_err_pid16675.log
Compiled method (c1)   48634 4793       1       JyNI.gc.DefaultTraversableGCHead::setLinks (6 bytes)
 total in heap  [0x00007f93e17affd0,0x00007f93e17b0288] = 696
 relocation     [0x00007f93e17b00f8,0x00007f93e17b0120] = 40
 main code      [0x00007f93e17b0120,0x00007f93e17b01c0] = 160
 stub code      [0x00007f93e17b01c0,0x00007f93e17b0250] = 144
 oops           [0x00007f93e17b0250,0x00007f93e17b0258] = 8
 scopes data    [0x00007f93e17b0258,0x00007f93e17b0260] = 8
 scopes pcs     [0x00007f93e17b0260,0x00007f93e17b0280] = 32
 dependencies   [0x00007f93e17b0280,0x00007f93e17b0288] = 8
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
fish: 'java -cp jython.jar:build/JyNI.…' terminated by signal SIGABRT (Abbruch)

I can read the error log, but I guess I should study the dumps? It points to multiarray, so I guess it is not related to further PyFile methods. I would tackle them, but first I would like to get matplotlib basically running. What are your suggestions?

Stewori commented 7 years ago

Is the issue reproducible? Can you isolate the Python code in matplotlib that provokes the crash? First thing to do is to create a minimal Python code sample (ideally independent from matplotlib) that reproduces the crash. Then post this sample as a new issue, so this thread can focus on PyFile support.

whilo commented 7 years ago

Issue is here now: https://github.com/Stewori/JyNI/issues/12

Stewori commented 7 years ago

Whilo, could you build a PR from what you achieved so far? That would make it easier for others (including myself) to reproduce the subsequent issue you describe.

Please adjust your approach such that it works without modifications in Jython. IMO this would be not too complicated (one additional native field access). On the other hand, having in mind that Jython is currently in late RC-phase for 2.7.1 release, such a change would only make it into Jython 2.7.2 and I wouldn't make a guess when this release will happen (hopefully the huge delay of 2.7.1 was an exception, but let's better not rely on that).

whilo commented 7 years ago

Sure, I can do that. The current approach is a hack to get something working and start from there. I don't think I need a patched Jython version.

Stewori commented 7 years ago

Hey whilo. Just posting a polite reminder to file a PR for your work on PyFile. It would be a pity to loose this, even if it's just a starting point so far...

whilo commented 7 years ago

Thanks for the reminder, it is high on my todo list. :)

Stewori commented 7 years ago

So, my work on porting JyNI to Windows is almost done. This means, it is a good time for a next release. Do you think we can get some PyFile support into JyNI alpha 5?

whilo commented 7 years ago

Thanks for insisting. I will sit down and work on it this evening. Will you be online on IRC?

Stewori commented 7 years ago

I can go online on IRC later. Note that it's not due exactly today. I'd like to make the release somewhat end of the week or next weekend. Also, it's not totally crucial to get PyFile into this release, but it would be nice.

Stewori commented 7 years ago

So, I am observing IRC log frequently, but I cannot guarantee that I can be online at a particular time. Ithink it would be best to resolve questions here in an async manner.

whilo commented 7 years ago

Ok, I have been there late in the evening, but I guess it was too late :). I have tried to do the JVM PyFile API calls in the C context, but somehow printing (printf) does not work anymore. I am not sure why.

Stewori commented 7 years ago

printf prints async regarding JVM console output. In many consoles printf and puts output will only show up on JVM shutdown or at random-like time points when C output is flushed. If JyNI.h is included, use jputs for text or jputsLong for integers; their output will be in sync with JVM output. I didn't yet write an actual jprintf, feel free to add one if you need it. Do you need printf for anything else than debugging?

whilo commented 7 years ago

Right, the C routine PyFile_AsFile(PyObject *f) is not called anymore when importing matplotlib it seems. Printing works. Is there an easy way to test it without depending on matplotlib? I am not sure what other libraries need native pyfile access. I can have a look later.

Stewori commented 7 years ago

Maybe there is some test code in CPython. Otherwise we should maybe write our own test(s). DemoExtension is intended to serve as a collection for native test code.

whilo commented 7 years ago

Ok, deleting the fontcache of matplotlib did the trick for now. But now I need to cast and access the FileIO object and I have no clue how to do that with JNI.

I will try to figure it out, but having a general chat medium with offline capability, code formatting and mail notifications would really be helpful for this, especially to have some period of time where I can just ask you dumb questions. From the open-source projects I run and support I can really recommend setting up gitter. It is open-source, is integrated with github and takes maybe five minutes to setup.

Stewori commented 7 years ago

Maybe you can add PyFile.fd as a static method to JyNI.java, e.g. like static int JyNI_PyFile_fd(PyFile file). Then it would be a single easy JNI call you can do the same way like you called PyFile.fd. But then we'd keep changes to JyNI and needn't modify Jython.

whilo commented 7 years ago
FILE *
PyFile_AsFile(PyObject *f)
{
    printf("Accessing file.\n");
    env(-1);
    if (f == NULL)
        puts("PyFile_AsFile with NULL-pointer");
    jobject f2 = JyNI_JythonPyObject_FromPyObject(f);
    //(*env)->CallVoidMethod(env, ((JyObject*) f)->jy, pyFileWrite, (*env)->NewStringUTF(env, s));
    jobject fileno = (*env)->CallObjectMethod(env, f2, pyFile_fileno);
    printf("fileno: %i \n", fileno);
    jobject fd_pyint = (*env)->CallObjectMethod(env, fileno, FileIO___int__);
    jint fd_int = (*env)->CallObjectMethod(env, fd_pyint, pyInt_getValue);
    printf("fd_int: %i \n", fd_int); // prints 0 ?!?
    jint fd = (*env)->CallObjectMethod(env, f2, pyFile_fd);
    printf("FD: %i \n", fd);
    // TODO get mode from PyFile
    return fdopen(fd, "r");
    //todo: JNI Exception handling
}
whilo commented 7 years ago

The fd_int printf line prints 0 and not the correct fd value. I guess that casting happens implicitly and if I try to call a non-existing method I get an exception from the JVM.

whilo commented 7 years ago

So I ignored the casting issue, but the result makes no sense so far to me. I guess PyInteger is 0 by default due to JVM Integer being 0 by default, but I am at least getting a proper instance of it, which is weird. If I had done something wrong on the way to access it, it should have thrown or caused a segmentation fault.

whilo commented 7 years ago

I have implemented the same path to fd as in PyFile.fd() now, so no idea why it fails.

whilo commented 7 years ago

If my current approach in C is actually worse in your opinion than adding a static method to JyNI (which is also somewhat odd by providing an external method for PyFile), then I will take that way. I thought it would be right to do it from C, without implementing custom helpers.

Stewori commented 7 years ago
jint fd_int = (*env)->CallObjectMethod(env, fd_pyint, pyInt_getValue);

should rather be

jint fd_int = (*env)->CallIntMethod(env, fd_pyint, pyInt_getValue);

That should fix it. (in jint fd = (*env)->CallObjectMethod(env, f2, pyFile_fd); respectively)

If my current approach in C is actually worse in your opinion than adding a static method to JyNI (which is also somewhat odd by providing an external method for PyFile)

Putting a sequence of Java calls to Java side is faster AFAIK, because it reduces calls from C into JVM which are JNI's slowest facet. That said, there is plenty of such stuff in JyNI. I guess I became a little painless about this. We can maybe add a proper method to Jython, but until that finds its way into a release, an external implementation in JyNI is a good migration path.

CalumFreeman commented 6 years ago

I've started working on finishing implementing the PyFile API. It's on a separate branch/fork and I hope to have decent tests for the API before merging it with master. But if a working PyFile API is needed for anything then I can submit a pull request earlier.

CalumFreeman commented 6 years ago

Question:

I am unsure as to what to do about the memory management functions in PyFile, I feel like there will be some details about JyNI GC that will come into play here. The functions I am most unsure about are:

PyFile_DecUseCount, PyFile_IncUseCount, tp_dealloc and tp_weaklistoffset.

There are generic object functions for tp_alloc and tp_free, I haven't tested them yet so they may not work as they may also have quirks.

General Progress Report:

My current progress is on my Implementing-PyFile branch. Before submitting a pull request I will fix the git history and clean things up(I'm not used to memory management, variables aren't always declared at the start of blocks, etc). I just thought it was worth having the progress somewhere other than just my local eclipse workspace. Most of PyFile is done and passes tests (33/46).

Other than the memory management things in the question, the remaining functions that don't have tests that they pass are: tp_iternext, tp_iter, tp_init and PyFile_WriteObject - I'm finding these tricky but I hope to finish them on Monday file_exit, file_enter and file_truncate - It wasn't instantly obvious what these do and I haven't looked into them yet, hopefully they are easy enough and I can also do them on Monday.

Tests: I've put the tests in a separate file, I think ideally there would be a C file and python file for each section of the API. I had looked into the CPython test suite, they do have tests for the Capi, but it seemed quite difficult to get them to run as they wouldn't import without all of the Capi being available/declared. I tried just uncommenting a lot of code in the hope of fooling the tests enough to import (of course they would still fail) but this didn't work very well. I also tried to cut out just the tests I wanted from their test module, at which point I realised I couldn't actually find any that tested the PyFile Capi specifically. At this point I gave up on trying to use the CPython tests and just wrote my own.

Stewori commented 6 years ago

Re UseCount: We need to figure out what fobj->unlocked_count actually means and - more importantly - how Jython implements that functionality. It looks like kind of an additional ref counter that counts how many objects/threads/? are currently using the file. Usually I would assume the ref counter could take care of this, but there must have been a reason why they added another counter. Maybe because the file handle int is not necessarily the same thing as the file object. I.e. the handle can be used by more participants than the file object. In that case it would make sense that fobj->unlocked_count is kind of a reference counter for the file handle while the ordinary ref counter manages the file object. @CalumFreeman That's just a guess from looking at the code. Is there something in CPython docs?

It looks like in Jython the role of the handle is played by a TextIOBase object, no obvious use counter so far. Maybe it's just using Java memory management w.r.t. TextIOBase to provide use count equivalent guarantees. In general that seems not so much an issue in Jython as there is not really public API to get the handle. I'd suggest we handle it in JyNI by maintaining the field fobj->unlocked_count in native PyFile objects and regart the Java PyFile object as constantly holding one use count. So it the counter would be initialized to one if a Java PyFile object is handled to native side. A purely natively created file object gets the usual initialization and if it is passed to Java side use count it incremented. Remains the question how we tidy up. Can we somehow hook into PyFile.close or so? E.g. by injecting a custom PyFile.Closer the wraps the original closer but additionally informs JyNI.

Stewori commented 6 years ago

Okay, I took a closer look. In CPython there is this code snippet:

static PyObject *
close_the_file(PyFileObject *f)
{
    int sts = 0;
    int (*local_close)(FILE *);
    FILE *local_fp = f->f_fp;
    char *local_setbuf = f->f_setbuf;
    if (local_fp != NULL) {
        local_close = f->f_close;
        if (local_close != NULL && f->unlocked_count > 0) {
            if (f->ob_refcnt > 0) {
                PyErr_SetString(PyExc_IOError,
                    "close() called during concurrent "
                    "operation on the same file object.");
...

Regarding the error case close() called during concurrent operation on the same file object. the counterpart in Jython is not so obvious. I think it relies on the JVM throwing an IOException at FileIO:

public void close() {
        if (closed()) {
            return;
        }
        try {
            fileChannel.close();
        } catch (IOException ioe) {
            throw Py.IOError(ioe);
        }
        super.close();
    }

I suggest to write a new subclass of FileIO, e.g. JyNIFileIO. For simplicity it wraps the original FileIO and delgates everything to its backend. From JNI we can inject this into PyFile.file. It would implement close such that a maybe native counterpart is checked for fobj->unlocked_count > 0. I guess the best idea would be to call native close_the_file to let it perform that check. In this scope I guess it's not necessary - unlike I suggested earlier - to track an fobj->unlocked_count count for the Java PyFile. Simply initialize it to zero, maintain the counter like C code currently does. We only have to incorporate this into the check in FileIO.close. Regarding the constructor I think we should implement it like other objects fully delegated to Jython. E.g. look at set.

static void
file_dealloc(PyFileObject *f)
{
    JyNIDebugOp(JY_NATIVE_FINALIZE, f, -1);
    // We will revisit that:
    //if (f->weakreflist != NULL)
    //    PyObject_ClearWeakRefs((PyObject *) f);
    ret = close_the_file(f);
    if (!ret) {
        PySys_WriteStderr("close failed in file object destructor:\n");
        PyErr_Print();
    }
    else {
        Py_DECREF(ret);
    }
    Py_TYPE(so)->tp_free(so);
}

Did you already implement close_the_file? It should only perform the check for use count and then delegate to Jython's FileIO.close in some way. Make sure there won't be a delegation cycle. See if you can figure that out. Otherwise we can follow up on that part here later.

Hope this helps! I admit that makes PyFile implementation a bit more complicated than I originally expected.

Stewori commented 6 years ago

Maybe it would be cleaner to implement PyFile_IncUseCount and PyFile_DecUseCount in Jython's original FileIO class. Via FileIO.__int__() Jython offers access to the file handle which a user could use in C-code (e.g. by JNI or JNR or so independently from JyNI) to access the file directly. However according to PyFile_IncUseCount doc one would have to call PyFile_IncUseCount/PyFile_DecUseCount API around such an access. Since this API is missing in Jython a user currently does not have a chance to do it right. "Who would directly use the file handle to access the file?" one might ask, which I guess is why it was never implemented. Now we have an answer: C-extensions! I think I will open an issue to suggest adding this API.

Stewori commented 6 years ago

http://bugs.jython.org/issue2699

Stewori commented 6 years ago

@CalumFreeman I think the effort of implementing PyFile_IncUseCount and PyFile_DecUseCount properly would not really pay off. Even for C-extensions the use cases for these are not that common. I suggest we go with a dummy implementation until we encounter a C-extension that actually depends on this. Please implement them for now such that they just print out a warning, e.g. "Warning: PyFile_IncUseCount not implemented." Please include into the implementation a doc comment that links this thread (#11) and also http://bugs.jython.org/issue2699.

With that done I suggest to shift focus to iterator support as it will have much higher impact.

CalumFreeman commented 6 years ago

Ok, I've set PyFile_IncUseCount and PyFile_DecUseCount to just print warning messages, I've also cleaned up the code a little, but I haven't fixed the git history. I'll switch focus to iterator support although I'm not quite sure PyFile is ready for a pull request. I also realised that it was unnecessary to implement many of the file_* methods as the PyObject_GenericGetAttr finds the java PyFile methods automatically. This mostly works, but there is a problem with the jython fileno(), it doesn't return the file descriptor as expected (this is why the workaround was needed in PyFile_AsFile):

CPython:

>>> file = open("/tmp/JyNI", "w+")
>>> file.fileno()
3

Jython:

>>> file = open("/tmp/JyNI", "w+")
>>> file.fileno()
org.python.core.io.FileIO@282308c3

This would be best fixed in jython although it's probably only noticed in JyNI/C extensions and we already have a workaround for the place it is most important. I don't think it's worth spending time on at this point, iterator support is more important.

Stewori commented 6 years ago

Regarding fileno please also see http://bugs.jython.org/issue2320.

Stewori commented 6 years ago

So, the situation seems to be as follows: In issue 2320 it is mentioned that Python-level fileno may actually return an object that is convertible to an int rather than an actual plain int. That means, you/we should compareint(file.fileno()) between Jython and CPython rather than plain file.fileno(). Looking at fileno doc this behavior seems to be undocumented, but issue 2320 mentions there was a BDFL decision about this. Actually we should push for a doc correction but "for some reason" I'd expect that to be controversial among CPython crowd. So I'd suggest not to heat this up for old Python 2.7 but reconsider it once Jython 3 is approaching. Then, I wondered what the difference between fileno and PyFile_AsFile is or should be. It seems that on Windows, file handle and file descriptor are distinct terms, while on POSIX it's the same. In general, the file handle is the thing returned by open and the file descriptor is the thing returned by fileno. Interestingly, fileno consumes the thing returned by open. Tht said, Java has it that way that on Windows FileDescriptor.fd is always -1, I suppose to indicate it's a posix thing. According to FileDescriptor implementation on Windows we should look at the field FileDescriptor.handle. This is not exposed by Jython and we will have to get it directly via JNI. Remaining question is what to do with fileno (on Windows). On Linux/POSIX it can simply delegate to PyFile_AsFile or be a copy of it. On Windows we'll have to check what CPython does. Return -1? That's easy to emulate. Otherwise it's probably Window's notion of fileno we should call, i.e. _fileno(PyFile_AsFile(whatever)).

Stewori commented 6 years ago

not quite sure PyFile is ready for a pull request

Maybe you could commit your work to your own JyNI fork. That would be the preparation for a PR anyway. That would allow for an informal review.

CalumFreeman commented 6 years ago

There are some branches on my own fork that sort of show what I've been doing. I've split the testing out into it's own branch to avoid conflicts when building tests for iterators and pyfile.

PyFile has two branches, a messy one and a cleaned up one. The cleaned up one is (obviously) cleaner and they both pass the same tests so I think it is better. (There's also a branch for iterators, I've fixed the numpy.random.randint() bug but not much else. I want to do a lot more on that branch)

Stewori commented 6 years ago

Sorry, I somehow forgot to look for branches :) Be prepared for some comments...

CalumFreeman commented 6 years ago

I'm tidying up PyFile so that it's ready for a PR and I have a few questions:

In file_new I assume that JyNI_PyObject_FromJythonPyObject(jfile) will create the PyObject* correctly if I give it a newly created jobject jfile = (*env)->NewObject(env, pyFileClass, pyFile_Constructor); is that correct? Or do I need to make a JyObject/ do other fancy things like use the tp_alloc function?

I also need to deal with the args/kwds in file_init, would JyNI_JythonPyObject_FromPyObject(args) handle this in any meaningful way? I feel it may be stretching what it can do to be giving it an a array of args or kwds and expecting it to just work out what they should look like so that they can be given as arguments to a jython function. (There is also the separate issue of whether the arguments are the same or not, but that just requires testing which I can do).

I'm still not entirely sure what to do for fileno, at the moment it just delegates to jython in the standard PyObject_GenericGetAttr way, to do anything else would require a change to Jython, or a custom tp_getattro. If I was to change something(leaving aside what the new fileno should return), which would be preferable, Jython or tp_getattro? We could also just leave it as is with the expectation that few C Extensions will call it directly when they could just call PyFile_AsFile instead.

Stewori commented 6 years ago

In file_new I assume that JyNI_PyObject_FromJythonPyObject(jfile) will create the PyObject correctly if I give it a newly created jobject jfile = (env)->NewObject(env, pyFileClass, pyFile_Constructor); is that correct? Or do I need to make a JyObject/ do other fancy things like use the tp_alloc function?

I think you have to do it like in SequenceIterator or in PySet. Creating a native object by converting it from the Java object is only supported for types that have an init function in JySync.c IIRC. Actually I would like to streamline this kind of creation and it might be that I once improved this in JyNI_PyObject_FromJythonPyObject(jfile), so you can try if you like. The issue is IIRC that xy_new functions are a special case that can result in call-cycles. If init functions for a type are present in JySync.c this does not circumvent the initialization either, it just moves it to the init function. Having a more automated implementation for such xy_new functions is actually a todo in my mind.

I also need to deal with the args/kwds in file_init, would JyNI_JythonPyObject_FromPyObject(args) handle this in any meaningful way?

Jython functions have a different call convention than CPython functions. While in CPython a tuple is passed for args and a dict for kw, in Jython it is only one tuple containing all args, plus optionally a string array assigning kw-interpretation to the trailng n args with n being the length of the string array (in Python 3 fastcall is an newer call convention that might work similar). JyNI_JythonPyObject_FromPyObject would not convert between these call conventions. It just gives you a tuple from the tuple that was passed in. Use JyNI_PyObject_Call (JyNI.c). It contains the right code to do this conversion. You don't even need to convert arg or kw, they can be passed in directly as C-PyObject*. (Todo: Maybe a version that accepts args and/or kw as jobject would be handy.) Also see PyCPeer.__call__ for the Java code that performs this conversion in the other direction.

Regarding file_fileno: I think the most compatible approach would be to keep the original implementation. That requires f->f_fp, so we need to set truncate_trailing of PyFile's type map entry such that f->f_fp is usable. Initialize it to NULL (in file_new) and let PyFile_AsFile populate it as a side effect. Let PyFile_AsFile return existing f->f_fp if it is already populated. Let file_fileno call PyFile_AsFile if f->f_fp is NULL. We need to take care to detect if the file was "silently" colsed on Java/Jython-side. I think PyFile.getClosed() is the right way to check this. The C-functions should perform their original fail-behavior if called on a closed file. If you need f->f_fp also somewhere else, don't forget to check PyFile.getClosed()there too. I think whenever a function discovers that PyFile.getClosed() returns true, f->f_fp should be zeroed as a side-effect. Please add a note to PyFile_AsFile that it needs to be revisited for Windows support.

CalumFreeman commented 6 years ago

Ok, I've updated file_init and file_new and pushed everything to PyFile_Implementing.

fileno

For file_fileno, it could be implemented like this:

static PyObject *
file_fileno(PyFileObject *f)
{
File* f_fp = PyFile_AsFile(f)    
if (f_fp == NULL)
        return err_closed();
    return PyInt_FromLong((long) fileno(f_fp));
}

This would behave identically but still allow us to have whatever implementation is best

But...

Memory leak?

I am slightly concerned that the current version of PyFile_AsFile creates a new FILE* every time it is called (viacFile = fdopen((int)fd, cmode); ... return cFile;) and this FILE* isn't tracked in GC. I think Extensions will expect this FILE* to be cleaned up with the PyFile since in CPython every time PyFile_AsFile is called it will return the same file pointer which is stored in the PyFile object. Then when the PyFile is GC'd that FILE* is also cleaned up, hence the extra ref count to make sure the FILE* isn't cleaned up when it's still in use and the PyFile isn't.

Because of all this the extensions won't clean up the FILE* and we aren't tracking it so can't deal with it. It may be that the FILE* will somehow be automatically deleted when it goes out of scope, but I don't know enough about C to have any confidence in suggesting that.

Basically, I think the current implementation will have a memory leak. The only way I can think of to solve this would be to store the file pointer and implement PyFile_Inc(or Dec)Ref. We should still only use the PyFile_AsFile and PyFile_Inc(or Dec)Ref to interface with whatever we build so that we can change the implementation details back and forth.

Staying in sync?

I'm also a little worried about keeping the FILE* in sync with the java file, so if we read a line in python, then read a line in C I think (but I'm not sure) we would expect Python to read the first line and C to read the second. This would work in CPython since it's the same FILE* underneath it all, but not in Jython+JyNI since the file pointer and the java FileIO are different and each track their current position in the file differently. Unless the location in the file is actually tracked by the OS in which case it is fine since we are opening the file from the file descriptor which is an OS level thing so they will be in sync automatically. I'll look into this some more.

Stewori commented 6 years ago

The memory leak concern is exactly why I suggested to have only a single file pointer FILE*, tracked in f->f_fp that is re-used all the time. Using new file pointers could maybe even change the result of fileno.

Regarding sync, that requires more investigation. Already to assess whether it is an issue at all. Doing something meaningful on this front is not feasible in your remaining project time. Let's just say, for now concurrent file access via native C-API and Java-side Jython API is not supported or at least not tested. I think most C-extension would do the whole file access in C or in Python and not really mix it. As with the access counter we can revisit this if it actually causes problems. Injecting a custom FileIO backend (that keeps track of sync) into Jython could solve this ultimately if required.

CalumFreeman commented 6 years ago

Yes, I thought that was one of the reasons you suggested tracking it, I didn't know it could affect fileno though.

Without trying to sync things, the remaining things to do in PyFile are: implement tests for: tp_alloc, tp_free, tp_iter, tp_iternext, tp_dealloc

tp_init is causing some trouble, probably because I'm not sure the test is right as I don't know how to set up the args/kwargs so they will parse correctly. I think this simple implementation should work:

static int
file_init(PyObject *self, PyObject *args, PyObject *kwds)
{
    jobject jfile = JyNI_JythonPyObject_FromPyObject(self);
    return JyNI_PyObject_Call(jfile, args, kwds);
}

But I don't really know for sure. I made a bit of an attempt at using the CPython implementation and using fill_file_fields as the place to call the jfile's init, but I ran into issues and it would take more than today to finish it.

The simple implementation doesn't seem to cause any problems so I think it does no harm, so PyFile could be pulled without any problems, but I don't know if tp_init/tp_new actually work to create a valid file from C. (tp_new does create an object with type file, but that's all I know it does for sure).

CalumFreeman commented 6 years ago

I should also note I have now added code to include f_fp and unlocked_count in PyFile_Implementing The FILE* is cleaned up and closed with the deletion of the PyFile object and there is a check to make sure the unlocked_count isn't positive. I also updated Building_Tests to avoid calling DecUseCount before IncUseCount since Dec checks to make sure it isn't negative meaning calling Dec before Inc would throw an error.