Open whilo opened 7 years ago
First of all, note that PyFile_WriteString
is already implemented and can serve as a pattern.
I remember I once decided it would be best to keep PyFile-stuff on Jython-side and call into that stuff on C-side. Such decisions are mostly made by carefully reviewing the header file. Especially if there are no macros accessing an object's internals (other than type, length, size and refcounter), the best way is to implement a Jython-centric solution like shown in PyFile_WriteString
.
That said, look at https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JyNI_JNI.h#L508 and https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c#L449 and https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c#L1100. At these three spots you must declare functions in Jython for C-access. https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JNI_util.h declares a bunch of magic and macros to simplify JNI-based declarations. Originally it was even more painful. See the macros https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JNI_util.h#L133 and below. These are meant for public access:
JNI_METH_CLASS(classID, name, ret, ...)
JNI_METH_INTERFACE(classID, name, ret, ...)
JNI_METH_CLASS2(classID, name, cName, ret, ...)
JNI_METH_STATIC(classID, name, ret, ...)
JNI_METH_STATIC2(classID, name, cName, ret, ...)
JNI_FIELD(classID, name, tp)
JNI_FIELD_STATIC(classID, name, tp)
JNI_CONSTRUCTOR(classID, cName, ...)
JNI_SINGLETON(classID, name, tp)
JNI_SINGLETON2(classID, name, cName, tp)
Yes, I have to document these (please help with that if you can!) Look at https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c for plenty usage examples. Feel free to ask further questions if you struggle with using these macros.
Note that JNI_METH(classID, name, ret)
and JNI_METH_1(classID, name, ret, arg1, tp1)
at the very bottom are still unfinished/experimental and not used by JyNI. The plan is that they would also add convenient calling-code rather than just the declaration.
In http://www.kfu.com/%7Ensayer/Java/jni-filedesc.html we can see that java.io.FileDescriptor
has a private int
-field fd
. I hope we can access that to implement PyFile_AsFile
. A Jython PyFile
uses org.python.core.RawIO
as backend. Most likely this would be a FileIO
, which contains a RandomAccessFile
, which in turn contains a java.io.FileDescriptor
. This will involve some struggling with private API and is thus best done directly from JNI (we might have to revisit the solution for Java 9). It might be a bit tricky to get this right; good luck!
Functions that don't access internals of the PyFileObject
struct, e.g. PyObject_AsFileDescriptor
can be commented in without a change. If they call other functions, we should focus on implementing these.
Ok, I have followed your advice. I have first exported the filedescriptor, which was already accessible in the Jython API (through the usual reflection hack in FileIO.__int__()
:
https://github.com/whilo/jython/commit/68ed56a90c945984d6005e755e6e0446ca57cc01
This was much easier than unwrapping the nested objects through the C API for me, but this is obviously only a hack to see how far I can get. I then added the necessary method declarations as you suggested:
https://github.com/whilo/JyNI/commit/c51c27754f80c0b807e1640b6c1c3a6b0f168383
I can import matplotlib.pyplot now (pylab still crashes):
I can create a plot object, but after the tkinter window pops up the process immediately crashes.
christian@lacan ~/D/JyNI> java -cp jython.jar:build/JyNI.jar org.python.util.jython
Jython 2.7.1rc3 (, Jun 19 2017, 12:45:06)
[OpenJDK 64-Bit Server VM (Oracle Corporation)] on java1.8.0_131
Type "help", "copyright", "credits" or "license" for more information.
>>> import setup_path
>>> import matplotlib.pyplot as plt
>>> plt.plot([1,2,3])
[<matplotlib.lines.Line2D object at 0x2a3>]
>>> plt.show()
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f93d1a0bdc8, pid=16675, tid=0x00007f93fa375700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-0ubuntu1.17.04.1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [multiarray.x86_64-linux-gnu.so+0x5adc8]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/christian/Development/JyNI/hs_err_pid16675.log
Compiled method (c1) 48634 4793 1 JyNI.gc.DefaultTraversableGCHead::setLinks (6 bytes)
total in heap [0x00007f93e17affd0,0x00007f93e17b0288] = 696
relocation [0x00007f93e17b00f8,0x00007f93e17b0120] = 40
main code [0x00007f93e17b0120,0x00007f93e17b01c0] = 160
stub code [0x00007f93e17b01c0,0x00007f93e17b0250] = 144
oops [0x00007f93e17b0250,0x00007f93e17b0258] = 8
scopes data [0x00007f93e17b0258,0x00007f93e17b0260] = 8
scopes pcs [0x00007f93e17b0260,0x00007f93e17b0280] = 32
dependencies [0x00007f93e17b0280,0x00007f93e17b0288] = 8
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
fish: 'java -cp jython.jar:build/JyNI.…' terminated by signal SIGABRT (Abbruch)
I can read the error log, but I guess I should study the dumps? It points to multiarray, so I guess it is not related to further PyFile methods. I would tackle them, but first I would like to get matplotlib basically running. What are your suggestions?
Is the issue reproducible? Can you isolate the Python code in matplotlib that provokes the crash? First thing to do is to create a minimal Python code sample (ideally independent from matplotlib) that reproduces the crash. Then post this sample as a new issue, so this thread can focus on PyFile support.
Issue is here now: https://github.com/Stewori/JyNI/issues/12
Whilo, could you build a PR from what you achieved so far? That would make it easier for others (including myself) to reproduce the subsequent issue you describe.
Please adjust your approach such that it works without modifications in Jython. IMO this would be not too complicated (one additional native field access). On the other hand, having in mind that Jython is currently in late RC-phase for 2.7.1 release, such a change would only make it into Jython 2.7.2 and I wouldn't make a guess when this release will happen (hopefully the huge delay of 2.7.1 was an exception, but let's better not rely on that).
Sure, I can do that. The current approach is a hack to get something working and start from there. I don't think I need a patched Jython version.
Hey whilo. Just posting a polite reminder to file a PR for your work on PyFile. It would be a pity to loose this, even if it's just a starting point so far...
Thanks for the reminder, it is high on my todo list. :)
So, my work on porting JyNI to Windows is almost done. This means, it is a good time for a next release. Do you think we can get some PyFile support into JyNI alpha 5?
Thanks for insisting. I will sit down and work on it this evening. Will you be online on IRC?
I can go online on IRC later. Note that it's not due exactly today. I'd like to make the release somewhat end of the week or next weekend. Also, it's not totally crucial to get PyFile into this release, but it would be nice.
So, I am observing IRC log frequently, but I cannot guarantee that I can be online at a particular time. Ithink it would be best to resolve questions here in an async manner.
Ok, I have been there late in the evening, but I guess it was too late :). I have tried to do the JVM PyFile API calls in the C context, but somehow printing (printf) does not work anymore. I am not sure why.
printf
prints async regarding JVM console output. In many consoles printf
and puts
output will only show up on JVM shutdown or at random-like time points when C output is flushed. If JyNI.h
is included, use jputs
for text or jputsLong
for integers; their output will be in sync with JVM output. I didn't yet write an actual jprintf
, feel free to add one if you need it.
Do you need printf
for anything else than debugging?
Right, the C routine PyFile_AsFile(PyObject *f)
is not called anymore when importing matplotlib it seems. Printing works. Is there an easy way to test it without depending on matplotlib? I am not sure what other libraries need native pyfile access. I can have a look later.
Maybe there is some test code in CPython. Otherwise we should maybe write our own test(s). DemoExtension is intended to serve as a collection for native test code.
Ok, deleting the fontcache of matplotlib did the trick for now. But now I need to cast and access the FileIO object and I have no clue how to do that with JNI.
I will try to figure it out, but having a general chat medium with offline capability, code formatting and mail notifications would really be helpful for this, especially to have some period of time where I can just ask you dumb questions. From the open-source projects I run and support I can really recommend setting up gitter. It is open-source, is integrated with github and takes maybe five minutes to setup.
Maybe you can add PyFile.fd as a static method to JyNI.java
, e.g. like static int JyNI_PyFile_fd(PyFile file)
. Then it would be a single easy JNI call you can do the same way like you called PyFile.fd
.
But then we'd keep changes to JyNI and needn't modify Jython.
FILE *
PyFile_AsFile(PyObject *f)
{
printf("Accessing file.\n");
env(-1);
if (f == NULL)
puts("PyFile_AsFile with NULL-pointer");
jobject f2 = JyNI_JythonPyObject_FromPyObject(f);
//(*env)->CallVoidMethod(env, ((JyObject*) f)->jy, pyFileWrite, (*env)->NewStringUTF(env, s));
jobject fileno = (*env)->CallObjectMethod(env, f2, pyFile_fileno);
printf("fileno: %i \n", fileno);
jobject fd_pyint = (*env)->CallObjectMethod(env, fileno, FileIO___int__);
jint fd_int = (*env)->CallObjectMethod(env, fd_pyint, pyInt_getValue);
printf("fd_int: %i \n", fd_int); // prints 0 ?!?
jint fd = (*env)->CallObjectMethod(env, f2, pyFile_fd);
printf("FD: %i \n", fd);
// TODO get mode from PyFile
return fdopen(fd, "r");
//todo: JNI Exception handling
}
The fd_int printf line prints 0 and not the correct fd value. I guess that casting happens implicitly and if I try to call a non-existing method I get an exception from the JVM.
So I ignored the casting issue, but the result makes no sense so far to me. I guess PyInteger is 0 by default due to JVM Integer being 0 by default, but I am at least getting a proper instance of it, which is weird. If I had done something wrong on the way to access it, it should have thrown or caused a segmentation fault.
I have implemented the same path to fd as in PyFile.fd()
now, so no idea why it fails.
If my current approach in C is actually worse in your opinion than adding a static method to JyNI (which is also somewhat odd by providing an external method for PyFile), then I will take that way. I thought it would be right to do it from C, without implementing custom helpers.
jint fd_int = (*env)->CallObjectMethod(env, fd_pyint, pyInt_getValue);
should rather be
jint fd_int = (*env)->CallIntMethod(env, fd_pyint, pyInt_getValue);
That should fix it.
(in jint fd = (*env)->CallObjectMethod(env, f2, pyFile_fd);
respectively)
If my current approach in C is actually worse in your opinion than adding a static method to JyNI (which is also somewhat odd by providing an external method for PyFile)
Putting a sequence of Java calls to Java side is faster AFAIK, because it reduces calls from C into JVM which are JNI's slowest facet. That said, there is plenty of such stuff in JyNI. I guess I became a little painless about this. We can maybe add a proper method to Jython, but until that finds its way into a release, an external implementation in JyNI is a good migration path.
I've started working on finishing implementing the PyFile API. It's on a separate branch/fork and I hope to have decent tests for the API before merging it with master. But if a working PyFile API is needed for anything then I can submit a pull request earlier.
I am unsure as to what to do about the memory management functions in PyFile, I feel like there will be some details about JyNI GC that will come into play here. The functions I am most unsure about are:
PyFile_DecUseCount, PyFile_IncUseCount, tp_dealloc and tp_weaklistoffset.
There are generic object functions for tp_alloc and tp_free, I haven't tested them yet so they may not work as they may also have quirks.
My current progress is on my Implementing-PyFile branch. Before submitting a pull request I will fix the git history and clean things up(I'm not used to memory management, variables aren't always declared at the start of blocks, etc). I just thought it was worth having the progress somewhere other than just my local eclipse workspace. Most of PyFile is done and passes tests (33/46).
Other than the memory management things in the question, the remaining functions that don't have tests that they pass are: tp_iternext, tp_iter, tp_init and PyFile_WriteObject - I'm finding these tricky but I hope to finish them on Monday file_exit, file_enter and file_truncate - It wasn't instantly obvious what these do and I haven't looked into them yet, hopefully they are easy enough and I can also do them on Monday.
Tests: I've put the tests in a separate file, I think ideally there would be a C file and python file for each section of the API. I had looked into the CPython test suite, they do have tests for the Capi, but it seemed quite difficult to get them to run as they wouldn't import without all of the Capi being available/declared. I tried just uncommenting a lot of code in the hope of fooling the tests enough to import (of course they would still fail) but this didn't work very well. I also tried to cut out just the tests I wanted from their test module, at which point I realised I couldn't actually find any that tested the PyFile Capi specifically. At this point I gave up on trying to use the CPython tests and just wrote my own.
Re UseCount:
We need to figure out what fobj->unlocked_count
actually means and - more importantly - how Jython implements that functionality. It looks like kind of an additional ref counter that counts how many objects/threads/? are currently using the file. Usually I would assume the ref counter could take care of this, but there must have been a reason why they added another counter. Maybe because the file handle int
is not necessarily the same thing as the file object. I.e. the handle can be used by more participants than the file object. In that case it would make sense that fobj->unlocked_count
is kind of a reference counter for the file handle while the ordinary ref counter manages the file object.
@CalumFreeman That's just a guess from looking at the code. Is there something in CPython docs?
It looks like in Jython the role of the handle is played by a TextIOBase
object, no obvious use counter so far. Maybe it's just using Java memory management w.r.t. TextIOBase
to provide use count equivalent guarantees. In general that seems not so much an issue in Jython as there is not really public API to get the handle. I'd suggest we handle it in JyNI by maintaining the field fobj->unlocked_count
in native PyFile objects and regart the Java PyFile
object as constantly holding one use count. So it the counter would be initialized to one if a Java PyFile
object is handled to native side. A purely natively created file object gets the usual initialization and if it is passed to Java side use count it incremented. Remains the question how we tidy up. Can we somehow hook into PyFile.close
or so? E.g. by injecting a custom PyFile.Closer
the wraps the original closer but additionally informs JyNI.
Okay, I took a closer look. In CPython there is this code snippet:
static PyObject *
close_the_file(PyFileObject *f)
{
int sts = 0;
int (*local_close)(FILE *);
FILE *local_fp = f->f_fp;
char *local_setbuf = f->f_setbuf;
if (local_fp != NULL) {
local_close = f->f_close;
if (local_close != NULL && f->unlocked_count > 0) {
if (f->ob_refcnt > 0) {
PyErr_SetString(PyExc_IOError,
"close() called during concurrent "
"operation on the same file object.");
...
Regarding the error case close() called during concurrent operation on the same file object.
the counterpart in Jython is not so obvious. I think it relies on the JVM throwing an IOException at FileIO:
public void close() {
if (closed()) {
return;
}
try {
fileChannel.close();
} catch (IOException ioe) {
throw Py.IOError(ioe);
}
super.close();
}
I suggest to write a new subclass of FileIO
, e.g. JyNIFileIO
. For simplicity it wraps the original FileIO
and delgates everything to its backend. From JNI we can inject this into PyFile.file
. It would implement close
such that a maybe native counterpart is checked for fobj->unlocked_count > 0
. I guess the best idea would be to call native close_the_file
to let it perform that check.
In this scope I guess it's not necessary - unlike I suggested earlier - to track an fobj->unlocked_count
count for the Java PyFile
. Simply initialize it to zero, maintain the counter like C code currently does. We only have to incorporate this into the check in FileIO.close
.
Regarding the constructor I think we should implement it like other objects fully delegated to Jython. E.g. look at set.
tp_free
to PyObject_Free
(as opposed to formerly PyObject_Del
)static void
file_dealloc(PyFileObject *f)
{
JyNIDebugOp(JY_NATIVE_FINALIZE, f, -1);
// We will revisit that:
//if (f->weakreflist != NULL)
// PyObject_ClearWeakRefs((PyObject *) f);
ret = close_the_file(f);
if (!ret) {
PySys_WriteStderr("close failed in file object destructor:\n");
PyErr_Print();
}
else {
Py_DECREF(ret);
}
Py_TYPE(so)->tp_free(so);
}
Did you already implement close_the_file
? It should only perform the check for use count and then delegate to Jython's FileIO.close
in some way. Make sure there won't be a delegation cycle. See if you can figure that out. Otherwise we can follow up on that part here later.
Hope this helps! I admit that makes PyFile implementation a bit more complicated than I originally expected.
Maybe it would be cleaner to implement PyFile_IncUseCount
and PyFile_DecUseCount
in Jython's original FileIO
class.
Via FileIO.__int__()
Jython offers access to the file handle which a user could use in C-code (e.g. by JNI or JNR or so independently from JyNI) to access the file directly. However according to PyFile_IncUseCount
doc one would have to call PyFile_IncUseCount
/PyFile_DecUseCount
API around such an access. Since this API is missing in Jython a user currently does not have a chance to do it right. "Who would directly use the file handle to access the file?" one might ask, which I guess is why it was never implemented. Now we have an answer: C-extensions!
I think I will open an issue to suggest adding this API.
@CalumFreeman I think the effort of implementing PyFile_IncUseCount
and PyFile_DecUseCount
properly would not really pay off. Even for C-extensions the use cases for these are not that common. I suggest we go with a dummy implementation until we encounter a C-extension that actually depends on this. Please implement them for now such that they just print out a warning, e.g. "Warning: PyFile_IncUseCount not implemented." Please include into the implementation a doc comment that links this thread (#11) and also http://bugs.jython.org/issue2699.
With that done I suggest to shift focus to iterator support as it will have much higher impact.
Ok, I've set PyFile_IncUseCount
and PyFile_DecUseCount
to just print warning messages, I've also cleaned up the code a little, but I haven't fixed the git history. I'll switch focus to iterator support although I'm not quite sure PyFile is ready for a pull request. I also realised that it was unnecessary to implement many of the file_* methods as the PyObject_GenericGetAttr finds the java PyFile methods automatically. This mostly works, but there is a problem with the jython fileno(), it doesn't return the file descriptor as expected (this is why the workaround was needed in PyFile_AsFile):
CPython:
>>> file = open("/tmp/JyNI", "w+")
>>> file.fileno()
3
Jython:
>>> file = open("/tmp/JyNI", "w+")
>>> file.fileno()
org.python.core.io.FileIO@282308c3
This would be best fixed in jython although it's probably only noticed in JyNI/C extensions and we already have a workaround for the place it is most important. I don't think it's worth spending time on at this point, iterator support is more important.
Regarding fileno please also see http://bugs.jython.org/issue2320.
So, the situation seems to be as follows:
In issue 2320 it is mentioned that Python-level fileno
may actually return an object that is convertible to an int rather than an actual plain int. That means, you/we should compareint(file.fileno())
between Jython and CPython rather than plain file.fileno()
. Looking at fileno doc this behavior seems to be undocumented, but issue 2320 mentions there was a BDFL decision about this. Actually we should push for a doc correction but "for some reason" I'd expect that to be controversial among CPython crowd. So I'd suggest not to heat this up for old Python 2.7 but reconsider it once Jython 3 is approaching.
Then, I wondered what the difference between fileno
and PyFile_AsFile
is or should be. It seems that on Windows, file handle and file descriptor are distinct terms, while on POSIX it's the same. In general, the file handle is the thing returned by open
and the file descriptor is the thing returned by fileno
. Interestingly, fileno
consumes the thing returned by open
. Tht said, Java has it that way that on Windows FileDescriptor.fd
is always -1
, I suppose to indicate it's a posix thing.
According to FileDescriptor implementation on Windows we should look at the field FileDescriptor.handle
. This is not exposed by Jython and we will have to get it directly via JNI.
Remaining question is what to do with fileno
(on Windows). On Linux/POSIX it can simply delegate to PyFile_AsFile
or be a copy of it. On Windows we'll have to check what CPython does. Return -1
? That's easy to emulate. Otherwise it's probably Window's notion of fileno we should call, i.e. _fileno(PyFile_AsFile(whatever))
.
not quite sure PyFile is ready for a pull request
Maybe you could commit your work to your own JyNI fork. That would be the preparation for a PR anyway. That would allow for an informal review.
There are some branches on my own fork that sort of show what I've been doing. I've split the testing out into it's own branch to avoid conflicts when building tests for iterators and pyfile.
PyFile has two branches, a messy one and a cleaned up one. The cleaned up one is (obviously) cleaner and they both pass the same tests so I think it is better. (There's also a branch for iterators, I've fixed the numpy.random.randint() bug but not much else. I want to do a lot more on that branch)
Sorry, I somehow forgot to look for branches :) Be prepared for some comments...
I'm tidying up PyFile so that it's ready for a PR and I have a few questions:
In file_new I assume that JyNI_PyObject_FromJythonPyObject(jfile)
will create the PyObject*
correctly if I give it a newly created jobject jfile = (*env)->NewObject(env, pyFileClass, pyFile_Constructor);
is that correct? Or do I need to make a JyObject/ do other fancy things like use the tp_alloc function?
I also need to deal with the args/kwds in file_init, would JyNI_JythonPyObject_FromPyObject(args)
handle this in any meaningful way? I feel it may be stretching what it can do to be giving it an a array of args or kwds and expecting it to just work out what they should look like so that they can be given as arguments to a jython function. (There is also the separate issue of whether the arguments are the same or not, but that just requires testing which I can do).
I'm still not entirely sure what to do for fileno, at the moment it just delegates to jython in the standard PyObject_GenericGetAttr
way, to do anything else would require a change to Jython, or a custom tp_getattro
. If I was to change something(leaving aside what the new fileno should return), which would be preferable, Jython or tp_getattro? We could also just leave it as is with the expectation that few C Extensions will call it directly when they could just call PyFile_AsFile
instead.
In file_new I assume that JyNI_PyObject_FromJythonPyObject(jfile) will create the PyObject correctly if I give it a newly created jobject jfile = (env)->NewObject(env, pyFileClass, pyFile_Constructor); is that correct? Or do I need to make a JyObject/ do other fancy things like use the tp_alloc function?
I think you have to do it like in SequenceIterator or in PySet. Creating a native object by converting it from the Java object is only supported for types that have an init function in JySync.c IIRC. Actually I would like to streamline this kind of creation and it might be that I once improved this in JyNI_PyObject_FromJythonPyObject(jfile)
, so you can try if you like. The issue is IIRC that xy_new
functions are a special case that can result in call-cycles. If init functions for a type are present in JySync.c this does not circumvent the initialization either, it just moves it to the init function. Having a more automated implementation for such xy_new
functions is actually a todo in my mind.
I also need to deal with the args/kwds in file_init, would JyNI_JythonPyObject_FromPyObject(args) handle this in any meaningful way?
Jython functions have a different call convention than CPython functions. While in CPython a tuple is passed for args and a dict for kw, in Jython it is only one tuple containing all args, plus optionally a string array assigning kw-interpretation to the trailng n args with n being the length of the string array (in Python 3 fastcall is an newer call convention that might work similar).
JyNI_JythonPyObject_FromPyObject
would not convert between these call conventions. It just gives you a tuple from the tuple that was passed in. Use JyNI_PyObject_Call
(JyNI.c). It contains the right code to do this conversion. You don't even need to convert arg or kw, they can be passed in directly as C-PyObject*
. (Todo: Maybe a version that accepts args and/or kw as jobject
would be handy.) Also see PyCPeer.__call__
for the Java code that performs this conversion in the other direction.
Regarding file_fileno
:
I think the most compatible approach would be to keep the original implementation. That requires f->f_fp
, so we need to set truncate_trailing
of PyFile's type map entry such that f->f_fp
is usable. Initialize it to NULL (in file_new
) and let PyFile_AsFile
populate it as a side effect. Let PyFile_AsFile
return existing f->f_fp
if it is already populated. Let file_fileno
call PyFile_AsFile
if f->f_fp
is NULL. We need to take care to detect if the file was "silently" colsed on Java/Jython-side. I think PyFile.getClosed()
is the right way to check this. The C-functions should perform their original fail-behavior if called on a closed file. If you need f->f_fp
also somewhere else, don't forget to check PyFile.getClosed()
there too. I think whenever a function discovers that PyFile.getClosed()
returns true, f->f_fp
should be zeroed as a side-effect.
Please add a note to PyFile_AsFile
that it needs to be revisited for Windows support.
Ok, I've updated file_init
and file_new
and pushed everything to PyFile_Implementing.
For file_fileno, it could be implemented like this:
static PyObject *
file_fileno(PyFileObject *f)
{
File* f_fp = PyFile_AsFile(f)
if (f_fp == NULL)
return err_closed();
return PyInt_FromLong((long) fileno(f_fp));
}
This would behave identically but still allow us to have whatever implementation is best
But...
I am slightly concerned that the current version of PyFile_AsFile
creates a new FILE*
every time it is called (viacFile = fdopen((int)fd, cmode); ... return cFile;
) and this FILE*
isn't tracked in GC. I think Extensions will expect this FILE*
to be cleaned up with the PyFile
since in CPython every time PyFile_AsFile
is called it will return the same file pointer which is stored in the PyFile
object. Then when the PyFile
is GC'd that FILE*
is also cleaned up, hence the extra ref count to make sure the FILE*
isn't cleaned up when it's still in use and the PyFile
isn't.
Because of all this the extensions won't clean up the FILE*
and we aren't tracking it so can't deal with it. It may be that the FILE*
will somehow be automatically deleted when it goes out of scope, but I don't know enough about C to have any confidence in suggesting that.
Basically, I think the current implementation will have a memory leak. The only way I can think of to solve this would be to store the file pointer and implement PyFile_Inc(or Dec)Ref
. We should still only use the PyFile_AsFile
and PyFile_Inc(or Dec)Ref
to interface with whatever we build so that we can change the implementation details back and forth.
I'm also a little worried about keeping the FILE*
in sync with the java file, so if we read a line in python, then read a line in C I think (but I'm not sure) we would expect Python to read the first line and C to read the second. This would work in CPython since it's the same FILE*
underneath it all, but not in Jython+JyNI since the file pointer and the java FileIO are different and each track their current position in the file differently. Unless the location in the file is actually tracked by the OS in which case it is fine since we are opening the file from the file descriptor which is an OS level thing so they will be in sync automatically. I'll look into this some more.
The memory leak concern is exactly why I suggested to have only a single file pointer FILE*, tracked in f->f_fp
that is re-used all the time. Using new file pointers could maybe even change the result of fileno.
Regarding sync, that requires more investigation. Already to assess whether it is an issue at all. Doing something meaningful on this front is not feasible in your remaining project time. Let's just say, for now concurrent file access via native C-API and Java-side Jython API is not supported or at least not tested. I think most C-extension would do the whole file access in C or in Python and not really mix it. As with the access counter we can revisit this if it actually causes problems. Injecting a custom FileIO backend (that keeps track of sync) into Jython could solve this ultimately if required.
Yes, I thought that was one of the reasons you suggested tracking it, I didn't know it could affect fileno though.
Without trying to sync things, the remaining things to do in PyFile are: implement tests for: tp_alloc, tp_free, tp_iter, tp_iternext, tp_dealloc
tp_init is causing some trouble, probably because I'm not sure the test is right as I don't know how to set up the args/kwargs so they will parse correctly. I think this simple implementation should work:
static int
file_init(PyObject *self, PyObject *args, PyObject *kwds)
{
jobject jfile = JyNI_JythonPyObject_FromPyObject(self);
return JyNI_PyObject_Call(jfile, args, kwds);
}
But I don't really know for sure. I made a bit of an attempt at using the CPython implementation and using fill_file_fields
as the place to call the jfile's init, but I ran into issues and it would take more than today to finish it.
The simple implementation doesn't seem to cause any problems so I think it does no harm, so PyFile could be pulled without any problems, but I don't know if tp_init/tp_new actually work to create a valid file from C. (tp_new does create an object with type file, but that's all I know it does for sure).
I should also note I have now added code to include f_fp
and unlocked_count
in PyFile_Implementing The FILE*
is cleaned up and closed with the deletion of the PyFile object and there is a check to make sure the unlocked_count
isn't positive. I also updated Building_Tests to avoid calling DecUseCount before IncUseCount since Dec checks to make sure it isn't negative meaning calling Dec before Inc would throw an error.
I have explored supporting matplotlib font loading, which require PyFile support.
I have commented in
Which is called and immediately SIG_ABORTS. I get a dump file from the JVM:
Just returning
NULL
is not sufficient as a quick hack and I guess that I need to wrap a Jython class for PyFile instead of accessing the pointer directly. You said it would be easy to fix, but in the comment you say:I had some frustrations with filesystem APIs lately for my kv-store implementation, so I would also be careful with double access. What are your suggestions to proceed? Also do you have some chat channel? This might be easier to get started.