Open dcharkes opened 2 years ago
Approach 1:
type_map a la ffigen
pseudo-config
type_map:
'com.apache.pdfbox.PDDOcument': 'pdfbox'
'android.os.Process': 'android_os'
import_libraries:
'pdfbox': 'package:pdfbox/pdfbox.dart'
'android_os': 'package:android/os.dart'
Approach 2: similar to above, but infer library names by package names.
import_map:
'com.apache.pdfbox.PDDOcument': 'pdfbox' // library imported as `pdfbox_`
'android.os.Process': 'package:android/os.dart' // library imported as `os_`
Advantage: simpler config Disadvantage: less control over import names in bindings, if that's an issue.
Cross linking the ffigen issue:
cc @mannprerak2 any suggestions from thinking about this in the ffigen context?
As detailed in dart-lang/native#521 , We don't have transitive dependency resolution at least now. Only specified packages are traversed.
So in the scenario, {A, B, C}, B and C import A.T
, both B and C have to specify A.T using import map only.
But yeah, the fundamental issue persists even if the dependencies aren't implicitly included.
If b.yaml
explicitly binds B.X, A.Y
and c.yaml
binds C.Z, A.Y
, Y from both packages isn't the same.
In jni there's no structs, everything is a fragile, weakly typed pointer so you can cast. But for the fundamental issue, I think there's no solution.
In such cases, manually refactoring A
into seperate jnigen config, and linking using some import map is best bet.
With a mapping from Java type to generated dart binding (both approaches), the API gets tied to whether we chose to generate single file or multi file. If the multi-file has approach still has a central file that exports everything, then it doesn't matter.
I like approach 2 better. I don't think we care about the imported library name in the generated file.
How about we make the keys regexes or prefixes? That way we can match all types in a package:
# Map<QualifiedJavaType, DartLibraryUrl>
imports:
'com.apache.pdfbox.*': 'package:pdfbox/pdfbox.dart'
In such cases, manually refactoring
A
into seperate jnigen config, and linking using some import map is best bet.
Agreed. That is the only way that this will work. (Same for ffigen.)
I suppose regex is not required, if we specify
'com.apache.pdfbox': 'package:pdfbox/pdfbox.dart'
and jnigen needs to resolve com.apache.pdfbox.pdmodel.PDDocument
it should first check for com.apache.pdfbox.pdmodel.PDDocument
, no specification,
then for com.apache.pdfbox.pdmodel
then for com.apache.pdfbox
- match found.
Any pitfalls of this approach?
Yeah in the context of Java a prefix might make more sense than a regex. 👌
cc @mannprerak2 any suggestions from thinking about this in the ffigen context?
For ffigen, libclang provides USR (id), which are supposed to be consistent across translation unit (individual parsing). Maybe we can export a custom yaml/json file which can contain type-map for declarations with their USR for a parsed structures (Or maybe add annotations in the generated code itself) Any new library can simply specify this (In library-import config), and since the USR would be same, we can reuse the definition similar to how type-map works.
Not sure if this is useful for jnigen.
@mannprerak2 It's same in Java actually. The fully qualified name is supposed to be same across all libraries loaded in single process.
Does ffigen have any mechanism to export USRs automatically, or is it manually specified through type map?
Does ffigen have any mechanism to export USRs automatically, or is it manually specified through type map?
Not yet, But it should be relatively easy to add to ffigen.
(note to self: USR = Unified Symbol Resolution)
We should design a format.
What information should be in there?
If we have multiple USIs for one Uri, we should probably nest the USIs in Uris.
symbols.yaml
symbols:
'package:pdfbox/pdfbox.dart':
- 'com.apache.pdfbox.*'
or
symbols:
'package:pdfbox/pdfbox.dart':
- 'com.apache.pdfbox.pdmodel.PDDocument'
# ...
For jnigen it might be nicer to just do
symbols:
'com.apache.pdfbox' : 'package:pdfbox/pdfbox.dart'
But if we spell out all symbols
symbols:
'com.apache.pdfbox.pdmodel.PDDocument' : 'package:pdfbox/pdfbox.dart'
'com.apache.pdfbox.pdmodel.Foo' : 'package:pdfbox/pdfbox.dart'
'com.apache.pdfbox.pdmodel.Bar' : 'package:pdfbox/pdfbox.dart'
because we're unlikely to have multiple regexes in a single .dart
.
However, for ffigen, it is a bit more likely.
One question is: should we output an exact list of symbols, or a regex (or prefix)?
A prefix from the config in jnigen is probably fine, it is unlikely to collide with other names.
However, in the ffigen context, taking the method/class include filters is likely to not work. So there we should probably output the full list of symbols.
If we'd like to keep the ffigen and jnigen symbols.yaml
output consistent with each other, it should be
symbols:
# Map<DartPackageUri, List<UniqueSymbolIdentifer>>
'package:pdfbox/pdfbox.dart':
- 'com.apache.pdfbox.pdmodel.PDDocument'
- 'com.apache.pdfbox.pdmodel.Foo'
- 'com.apache.pdfbox.pdmodel.Bar'
# ...
wdyt?
In java, binaryName
is unambiguous and uniquely maps to a class. FQN may be ambiguous in rare cases. In jnigen
we use binaryName everywhere.
I am not familiar with USRs.
{A.h, X.h}
in one ffigen config1, {C.h, X.h}
in config2, is the USR for symbol x
from X
same in both? Assuming same system of course.I think it should also be possible to semi-manually work around this, by having the user provide a mapping which asserts X.h in config2 is equivalent to X.h in config1.
In the rest of this comment I am assuming we have a way to identify each binding symbol in string format, and there's some hierarchical structure. Fortunately this is almost always true in Java.
I think it's logical to have a mapping from required symbol to required import and not vice versa, because we lookup by symbol, and it's generated file && read by program, even if it looks tidier to have a tree with each import listing what it provides. Of course you can convert to different internal representation with a simple for loop. So matter of preference. Should this file be human readable? Does this file's size matter?
Assuming a hierarchical structure among bindings, i.e given an identifier you can tell it's parent (eg java package or header file), we can easily implement lookup in most-specific to most-general order. (As I gave an example above).
a.b -> 'package:a/b.dart';
resolve(a.b.C)
will look for a.b.C
first, then a.b
, and matches the above spec.
Assuming b encompasses all subpackages as well, a.b.c.D will match the above import spec. Maybe if we don't want this behavior, we can use a '/' or other symbol to explicitly specify that 'a.b' is recursively encompassing, and change the lookup code accordingly. That's not a problem.
Now, assuming some classes were filtered out using config, we need to explicitly write the filtered out class with null mapping, so an explicitly skipped class a.b.X won't match the above spec. Code should stop at seeing null.
There are more implementation details to still flesh out, but I think the basic idea is sufficient.
I think regular expressions may be harder to do constant time lookup, unless maybe you sort the list of all regexes and binary search the ones closest to what you are looking for. with prefixes there will be a single digit number of combinations you have to check in a map. Is there any specific advantage of using regex in ffigen context?
But yeah above all is leetcode. If it's only a machine read-write file I don't mind writing out every type either, it will still be smaller than generated binding file. And reading some JSON/YAML should be pretty fast compared to parsing C/Java.
https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_computer_science
Are they serializable in some way? Are they same across different invocations / parses of same set of files? Are they same for a given symbol across different sets of files? Assuming same system of course.
Yes to all.
Is there any specific advantage of using regex in ffigen context?
C libraries usually add a prefix to all the symbols in a library so as to prevent name collision. (E.g libclang uses clang_
for functions and CX
for rest. So a regexp is really useful here to quickly target all symbols of a library .
Edit: If you meant just using it in symbol.yaml context, then I don't think regexp is of much use.
Re: Re: YAML format
Should this file be human readable?
It makes it easier for people to understand the whole setup. Also, people will see it in diffs when committing and in PRs etc.
Does this file's size matter?
Likely not much, I presume that everything (including git) compresses files. And repeated strings are relatively easy to compress. However, more text also means it is hard for users to read through if trouble shooting is needed. So smaller is usually better.
Of course you can convert to different internal representation with a simple for loop.
Exactly. So I prefer optimizing for simple for humans to understand in diffs etc.
tidier to have a tree
That would be slightly tidier than a map indeed.
Something like
symbols:
'package:pdfbox/pdfbox.dart':
'com.apache.pdfbox.pdmodel' :
'PDDocument' : true # equals 'com.apache.pdfbox.pdmodel.PDDocument' : true
'Foo' : true
'Bar' : true
'nested' :
'Baz' : true
'nested2' : true
# ...
But that requires organizing all the symbols that jnigen binds to into a tree of common-prefixes. I'm not sure if that is worth it over a flat list:
symbols:
'package:pdfbox/pdfbox.dart':
'com.apache.pdfbox.pdmodel.PDDocument' : true
'com.apache.pdfbox.pdmodel.Foo' : true
'com.apache.pdfbox.pdmodel.Bar' : true
# ...
We'd like to have a map (or set) in the Dart runtime for quick lookup, but in yaml it just creates noise. So probably the list that I suggested is nicer.
symbols:
'package:pdfbox/pdfbox.dart':
- 'com.apache.pdfbox.pdmodel.PDDocument'
- 'com.apache.pdfbox.pdmodel.Foo'
- 'com.apache.pdfbox.pdmodel.Bar'
# ...
Now, assuming some classes were filtered out using config, we need to explicitly write the filtered out class with null mapping, so an explicitly skipped class a.b.X won't match the above spec. Code should stop at seeing null.
This is a reason to not use prefixes at all in our yaml file., but only generate mappings for the full path.
Based on the discussions for ffigen:
format_version: 1.0.0
files:
"../generated/base_gen.dart" :
used_config:
<some-symbol-that-infuences-importing-bindings>: <some-value>
symbols:
'package:pdfbox/pdfbox.dart':
- 'com.apache.pdfbox.pdmodel.PDDocument'
- 'com.apache.pdfbox.pdmodel.Foo'
- 'com.apache.pdfbox.pdmodel.Bar'
Note that compared to https://github.com/dart-lang/native/issues/539, we don't have to include a name
per unique symbol, because the last part after the last .
is the symbol. (If that turns out to not be true, we would need to do the same as the ffigen config and make the unique symbols a map with a name
member instead of a list.)
If that turns out to not be true
Yes. symbols can be renamed, both for conflict resolution and probably in future, manually.
name = mapping["name"] ?? getElementName(binaryName)
, not a problem in practice I think.
Few notes / questions:
format_version: 1.0.0
files:
"lib/src/" :
used_config:
output_structure: package_structure
symbols:
- 'com.apache.pdfbox.pdmodel.PDDocument'
- 'com.apache.pdfbox.pdmodel.Foo'
- 'com.apache.pdfbox.pdmodel.Bar'
So this means the above classes are in lib/src/ under java-esque folder tree, and imports should be resolved as such.
I am still catching upto discussions on ffigen. I may be missing few points from the discussion.
This also raises a question of lots of small imports vs few large imports (eg per package _package.dart
files).
Okay, so we need name as well.
(Oops, I copy pasted too much. We had two file paths in there.)
I believe output_structure
is compatible.
format_version: 1.0.0
files:
'package:pdfbox/pdfbox.dart': # Everything exported from some lib/src/generated/_all.dart
used_config: # Optional.
# Probably empty for now.
symbols:
'com.apache.pdfbox.pdmodel.PDDocument':
name : 'PDDocument'
'com.apache.pdfbox.pdmodel.Foo':
name : 'Foo'
'com.apache.pdfbox.pdmodel.Bar':
name : 'BarRenamed'
format_version: 1.0.0
files:
'package:pdfbox/com/apache/pdfbox/pdmodel/PDDocument.dart': # Directly generated into the public API of the package.
symbols:
'com.apache.pdfbox.pdmodel.PDDocument':
name : 'PDDocument'
'package:pdfbox/com/apache/pdfbox/pdmodel/Foo.dart':
symbols:
'com.apache.pdfbox.pdmodel.Foo':
name : 'Foo'
'package:pdfbox/com/apache/pdfbox/pdmodel/Bar.dart':
symbols:
'com.apache.pdfbox.pdmodel.Bar':
name : 'BarRenamed'
Both of these would work when being imported from another binding generation.
Depending on configuration (multifile vs single-file), we might want a more general way of specifying class names
We don't have to, we can keep it simple, just list the full path of all files in the yaml as I did above.
I'll try to specify everything here before implementing the feature.
The default path will be a top-level jnigen_export.yaml
. This can be changed in output > export > path
:
output:
export:
path: a_export.yaml
This can be specified in export > exclude
:
export:
exclude:
- com.example.A
- com.example.B
version: 1.0.0
files:
'package:x/y.dart':
classes:
'com.example.A':
name: A
type_class: $AType
super_count: 2
type_params:
T:
'java.lang.Object': DECLARED
methods:
abc: 2 # in this case the method got the 2 suffix, so when extending this class, we need to make sure the corresponding method that overrides this, get the 2 suffix
super_count
can be removed by me in a PR, in that case it's also going to be removed from the yaml.
fields and method renames can be added in future versions.
This can be specified in import
. All of the ones below are supported:
import:
- 'package:x' # will look for the default top-level jnigen_export.yaml
- 'package:y/y_export.yaml'
- 'path/to/z_export.yaml'
If the same class comes in a subsequent import, that one will be overriden with a warning log.
import
is going to have package:jni
at the beginning of it by default.
If the same class is generated by the library itself, that takes precedence over the imported ones.
package:jni
has multiple classes such as List
what if we want to make sure that we have that one even if some other import overrides it?:
import:
- 'package:java_collections' # even though this has a class for java.util.List
- 'package:jni': # this takes precedence
show: 'java.util.List'
Or
import:
- 'package:java_collections': # You can hide individual ones
hide: 'java.util.List'
Am I missing something? Different structure or naming suggestions?
If the same class is generated by the library itself, that takes precedence over the imported ones.
That does mean that if the imported library has methods that take that class as an argument, they will still use the definition of that library. So we end up having to do casts on that boundary.
Maybe it's better to have an error, and having to explicitly hide
. Then having to do casts because of duplicate definitions does not come as a surprise to users.
Different structure or naming suggestions?
We use "symbol-file" as the name for the yaml file in FFIgen. And symbol-file
in both the import and export config. Should we consider taking that naming here as well so that we have a way to talk about the file. (It's called symbols because that's what we call the things you can look up in dlls, not sure if we should have a slightly different name in Java/Kotlin land.) Having proper name for the file instead of calling it foo_export.yaml
. What is the "thing" or "things" that we import/export?
Also, why have the package name in there? That is superfluous.
import: - 'package:x' # will look for the default top-level jnigen_export.yaml
Oh I like! (Though we could consider not doing that and just doing the non-magic thing. E.g.analysis_options.yaml
does include: package:lints/recommended.yaml
. We keep the ecosystem consistent. It's not that much more typing and is crystal clear.)
classes:
Are there any other things than classes
that we export/import?
(Just a large config for my own sake to see where imports fit in.)
preamble: |
// ..
output:
c:
path: 'src/'
subdir: 'third_party/'
library_name: 'foo'
dart:
path: 'lib/src/third_party/'
symbols:
path: 'lib/symbols.yaml' # So that it gets a library uri.
classes:
- 'org.example.foo.Foo'
maven_downloads:
source_deps:
- 'org.example....'
import:
- 'package:java_collections/symbols.yaml' # Using a library uri.
Again, 'symbols' might not be the right term to describe this. But 'import' and 'export' describes the action with the "thing" not the "thing", not the contents of that yaml file.
If we have nothing else than 'classes', maybe it should be classes.yaml
? Or jnigen_definitions.yaml
? The thing describes the structure of the generated Dart code, toplevel 'things' in Dart are 'definitions'.
Maybe it's better to have an error, and having to explicitly hide.
Good idea.
Though we could consider not doing that and just doing the non-magic thing.
Agreed, I think that's better. No default and having to explicitly naming it is good.
Are there any other things than classes that we export/import?
classes
key is there so that if we want to add some option to a single file package:x/y.dart
, we can. I don't exactly know if we need this, so maybe we can remove classes
and bump up the version in case we needed it in the future.
Tangential: I think it's also useful to have some way of breaking down the generated "symbols" files into multiple ones, maybe adding an option to add sub-exports per generated library. So if a library generates the entire java stdlib, we could import only `package:java/util/util_symbols.yaml'. Also including multiple symbols files into one.
Not necessary for closing this issue but could be useful in the future.
Tangential: I think it's also useful to have some way of breaking down the generated "symbols" files into multiple ones, maybe adding an option to add sub-exports per generated library. So if a library generates the entire java stdlib, we could import only `package:java/util/util_symbols.yaml'. Also including multiple symbols files into one.
That's a good idea.
But indeed let's just build the simple thing first, and bump the version if we want to cover subsequent use cases.
files: 'package:x/y.dart':
If this is relative path, you don't have to take package name x
as a config option. Is there an argument in favor of including package name?
I have been inclined against package name in config, although it might have made few things easier, because jnigen should work from anywhere in hierarchy, then you will be also adding the relative path to config from package root. (Correct me if I missed something).
show: 'java.util.List'
Nice!
It's better to change the format to json instead of yaml. The argument is the same as with native_assets
, the data is going to be produced and consumed by tools not humans, so yaml is unnecessary.
My glorious infestation of yaml is coming to an end...
We should support interop between 2 or more jnigen-generated packages.
_Originally posted by @mahesh-hegde in https://github.com/dart-lang/native/issues/761