beeware / batavia

A JavaScript implementation of the Python virtual machine.
http://pybee.org/batavia
Other
1.39k stars 424 forks source link

GSOC: Implementing Batavia data types (draft) #728

Closed mdmjg closed 6 years ago

mdmjg commented 6 years ago

Introduction

My name is Maria del Mar Jaramillo, and I am Computer Science student from New York University Abu Dhabi. I discovered my passion for programming during my senior year in High School, and ever since then, I have participated in web development and app development projects that have deepened my passion for coding.

Beeware Project

I would be interested in contributing to Beeware Project because the organization’s purpose resonates with my own experiences. Creating a cross-platform application has allowed me to appreciate Beeware’s interest in helping new programmers create powerful applications through an approachable language such as Python. The idea behind Beeware is relevant, as no programmer will be limited by their lack of experiences with languages such as Javascript; instead, coders will be able to build applications from the very beginning of their career.

Contribution

Due to my experience with both Python and Javascript, I decided that working with Batavia, Beeware’s bytecode machine written in Javascript, would be my focus during Google Summer of Code. My goal is to contribute to Beeware Project by building the implementation frozen set and a partial implementation of the time module. In order to complete the necessary implementations, I would need to use Cricket, Beeware’s test running tool. The error messages provided by the tool would point out the differences between the CPython’s API and Batavia’s. I would also experiment with CPython and Batavia’s online platform in order to see the differences myself, and make sure that all my implementations are correct.

Background Research - Frozen Set

Both the set and the frozen set represent unordered collections of objects in Python. However, unlike the set object in Python, the Frozen set is immutable. Important things to consider while building the methods include the fact that I may not use the .add function since you cannot add new elements to a frozen set. Hence, the frozen set method implementations will differ from the set implementations. All methods for Frozen Set: ['__and__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'copy', 'difference', 'intersection', 'isdisjoint', 'issubset', 'issuperset', 'symmetric_difference', 'union']

Unimplemented Methods from Frozen Set (From Python Standard Library): Copy(): Creates a shallow copy of the object. Difference(): Returns the elements that are present in this and not in other. Isdisjoint(): Returns true if the two sets have no common values. Issubset(): Returns true if all values of this are present in other. Issuperset(): Returns true if all the values of other are in this. Symmetric_difference(): Returns a set with the elements from this and other but not the elements that are in both sets.

This gives a total of 6 unimplemented operations that would have to be written and tested.

Background Research - Time module

The time module is partially implemented in Batavia. My goal is to implement all the missing methods from the time functions in the Python Standard Library (see 16.3.1 https://docs.python.org/3/library/time.html#functions, not entire module). Some key aspects of the time module include the fact that it does not handle dates before January 1, 1970 or after 2038. Additionally, Python operates in seconds, whereas Javascript operates in milliseconds, which means that conversions must be written in the implementations. Furthermore, when parsing two-digit years, Python is able to convert 69-99 to 1969-199 and 0-68 to 2000-2068.

The following Javascript functions will be useful in the implementation of the time module:

Methods that need to be implemented: .asctime([t]): Converts the time into a string of the form “Mon Jan 13 11:27:19 2001” .clock_getres(clk_id): Only available on Unix. Returns a precision of the parameter. .clock_gettime(clk_id): Only available on Unix. Returns the time specified on the parameter. .clock_settime(clk_id, time): Only available on Unix. Sets the time specified on the parameter. .ctime([secs]): Converts seconds into a string stating the local time. Seconds are counted since the beginning of the epoch. .get_clock_info(name): Returns information on the clock in the parameter. .monotonic(): A monotonic is a clock that cannot go backwards. This function will return a monotonic clock in fractional seconds. .perf_counter(): Returns the value in seconds of a clock with the highest availability to measure a small amount of time. .process_time(): Returns the amount of seconds of the sum of the CPU and the system. .strftime(format[, t]): Converts a time to a specific format. .strptime(string[, format]): Unnecessary because time.strftime will already be implemented. .tzset(): Only available on Unix. Resets time conversion.

This gives a total of 11 methods.

Project Plan

April 23 - May 14: Community Bonding

May 15-May 20

May 21-May 27

May 28 - June 3

June 4 - June 10

June 11-June 17

June 18-June 24

June 25-July 1

July 2 - July 8

July 9 - July 15

July 16 - July 22

July 23 - July 29

July 23 - July 29

July 29 - August 6

August 7 - August 16

Analyzing the Project Plan

The first part of my project will focus on implementing missing methods for the Frozen Set. There will be a week or two of slow progress between June 5 and June 15 due to exams, but I will make up for it with faster progress during the previous and following weeks. One of my main concerns is regarding the some of the time functions, as they rely on other Python time constants that have not been implemented yet. I could use Javascript functions to replace these constant or I could use a TODO to resolve my issue. I plan, however, to implement these constants near the end of the project in order to make the time module more complete.

Another possible problem is the differences between the Javascript time functions and the Python time functions. To begin, Javascript operates in milliseconds and python in seconds, which can cause easy mistakes if conversions are not done properly. I will, as a result, have to be careful while testing and implementing.

freakboy3742 commented 6 years ago

You'll need to open permissions so everyone can view this; or, preferably, copy the content directly into the ticket.

As for the "about me" section; that's not really needed. We already know who you are from the pre-selection process, so it won't hurt or harm your proposal.

freakboy3742 commented 6 years ago

Ok - I've taken a look; in short, we need a lot more detail. You've essentially just broken your 12 weeks into 3 blocks, one for each data type you're going to complete. What gives you any confidence that completing a datatype is 4 weeks of work?

This process isn't about starting with the time available and working backwards to allocate that time to tasks; you need to build up from the tasks that need to be done.

There's one other detail (which I thought I mentioned in the call, but I may have forgotten) - "testing and documentation" should not be a separate line item in your schedule. Testing and documentation is something that can't be separated from coding. If you're doing all your testing at the end, then you're not coding properly :-)

mdmjg commented 6 years ago

@freakboy3742 That's something I wanted to ask you actually. I'm not sure how much each data type could take... in your experience, do you think that it could be completed in much less or more than 4 weeks? I think some data types could definitely take more than others, as some of them have a lot more tests in the "not_implemented" list, I would specify the different times after choosing the data types. So far, I'm thinking of doing float, complex, and frozen set. These last two I have never worked with before, so I wold expect more time than with float.

freakboy3742 commented 6 years ago

Keep in mind it's not just the "not_implemented" list that needs work. The not_implemented list is a list of automatically generated tests for operations, but there are still tests for the methods on objects. For example on the "float" datatype, the automatically generated tests will validate all the addition, subtraction and multiplication operations work; but there's still methods like conjugate(), hex(), is_integer()... those tests can't be automatically generated, because they don't follow a predictable pattern; we need to both write the tests, and the implementation for those methods.

What methods need to be implemented? Your Python shell can tell you:

# Create a float
>>> x = 1.2345
>>> dir(x)

dir() will list all the attributes and methods that are available. Many of them are "dunder" methods (double-underscore methods, like __add__), but there are couple that aren't.

As for how long will it take? I can't answer that - because the answer for me will be different for you. That's one of the reasons we suggest trying to implement one of the operations during the pre-proposal process - it gives you a sense of what is there, and how complex it might be to complete something bigger.

The NotImplementedType __getitem__ method wasn't especially complicated; so, you might benefit from trying to implement something a little more complex. You've suggested FrozenSet as something you might want to look at; why don't you try implementing the union() and intersection() methods for FrozenSet, and see how long that takes.

Once you've got an idea of how long it takes to implement a slightly more complex method, you should be able to survey the methods that are needed, and work up a more detailed time estimate.

mdmjg commented 6 years ago

@freakboy3742 Okay! I have made several changes, as well as specifying that I would be implementing the Frozen Set, dictionary and the html.parser module from the Python standard library. Could you take a look so I can see if I should submit it now?

freakboy3742 commented 6 years ago

@mdmjg Could we possibly get this content on the ticket, rather than in a separate Google doc? That way, anyone in the community can provide feedback, and it's a better long-term archive for the project as a whole.

mdmjg commented 6 years ago

@freakboy3742 alright! I just edited it :)

freakboy3742 commented 6 years ago

@mdmjg Ok - this is a lot better in terms of detail; but there is a lot more work required.

In particular, you'll want to take a closer look at the list of methods you've proposed. I can tell you've got this list by running dir() on the data type, and compared with the current implementations; but many of those methods already exist (e.g., the __r...__ operators), are inherited from base classes (e.g., __setattr__), or make no sense to implement (e.g., __new__).

I'm also very sceptical of a plan that proposes it will take 4 weeks to implement a couple of set operations, but will also take 4 weeks to implement an entire HTML parsing module. Unless you're proposing a wrapper around an existing library, I'd expect writing a HTML parser to be an entire GSoC project in itself. This schedule feels more like you've tried to fill 12 weeks with things that make you look busy, not an estimate based on considering the work required.

Don't worry about "looking busy". I'd much rather see a plausible, detailed project plan that will cover less work, than a packed schedule that seems unachievable.

mdmjg commented 6 years ago

@freakboy3742 I see. One question, when going through the methods, I saw that getattr was already implemented, but one of the necessary methods is getattribute. Is it necessary to implement getattribute or would it not make sense?

freakboy3742 commented 6 years ago

__getattribute__ and __getattr__ are closely related. The current master branch of Batavia hides this a bit; I'm working on a (very big) patch that will clean up that particular detail.

In general, ask yourself if the method is functionality on the datatype itself (how the data type behaves) or "meta" functionality (how the data type becomes a datatype in the first place). If it's a meta method (like __init__ or __getattr__), you can probably ignore it for your purposes.

mdmjg commented 6 years ago

@freakboy3742 Alright, I will not include it then. On another note, as you suggested, I will not implement the html parsing module because I still want to do both a data type and a module. Hence, I have been looking at the time module and I want to do a partial implementation of it. The time module has three sections: Functions, Constants and Time Constants. The Functions section is already partially implemented, my goal would be to implement the rest of this specific section. After doing some research, I have a couple of questions: 1) Would it be okay for me to implement the remaining time functions? This means that I would not implement the entire remaining time module, but instead only the functions that are missing. (This sounds a bit abstract, but if you take a look at https://docs.python.org/3/library/time.html#module-time you can see what I'm referring to) 2) I have deleted some of the methods that you pointed out were unnecessary or repeated and I believe I can complete the two data types within a month. This may seem like slow progress but my classes end on June 15, and thus my progress would be a little slower during this time. After looking at documentation of previous GSoC proposals, I think it would be feasible for me to complete the remaining implementations of the time module functions in the last two months. In your experience, do you think this is feasible? 3) One of the functions from the time module, time.tzset(), relies on the os module, which has not been implemented. I believe I could implement the function using the Javascript moment library, but this would mean that the user would not call the function using os.environ(), but instead would use a specific parameter that would not be required in CPython (For examples on how os.environ() is necessary for this function, you can check out https://www.tutorialspoint.com/python/time_tzset.htm). My question is: would it be worth it to implement this function? I believe that implementing it would be useful, but it would sacrifice the ultimate Batavia goal of having Batavia be exactly like CPython. Perhaps this specific function should only be implemented once the os module exists within Batavia.

Let me know if any of the questions didn't make sense and sorry for flooding your inbox with so many questions :)

freakboy3742 commented 6 years ago
  1. Implementing a couple of additional time methods seems a lot more plausible than the whole HTML Parser module. :-)

  2. I'm not sure if you've updated your proposal to reflect the methods that have been deleted; the list that I'm seeing as of right now still has a lot of methods in the timeline that don't need to be addressed.

  3. What you're hitting there is an occasion we need to fake something for rough compatibility. There's no real concept of an "environment" in Javascript, so the best we can hope for is to treat that environment as a configuration mechanism.

From an API perspective, the environment is just a set of key-value pairs (i.e., a dictionary). If os.environ() was to just return an empty dictionary, that would be fine as a first pass. A more advanced approach would be to set values in a "pseudo-environment" that reflect the environment that the browser exposes. More advanced still would be to provide a mechanism for end users to configure that environment when starting the Python Virtual Machine.

If you want to incorporate any of those options into your proposal, feel free; but if it's easier to just put a TODO into the code when tzset invokes os.environ(), and treat that as an unimplemented feature, that would be fine too.

mdmjg commented 6 years ago

@freakboy3742 Oh yes, I hadn't updated my proposal cause I was waiting for your response so I could update it accordingly. I have made the changes now.

mdmjg commented 6 years ago

@freakboy3742 I have changed the methods for frozen set once again as you suggested. Also, after taking another look at the methods for dict, I realized that the necessary ones were already implemented, so I decided to take that part away altogether. I thought about including another data set but I think I should focus on the Frozen Set and time module for now. As you said, it's better to have a plausible project than a complex, unlikely one.

freakboy3742 commented 6 years ago

@mdmjg This is getting better; the one major detail you're perhaps missing is that FrozenSet and Set are exactly the same except for the fact that you can't modify and existing FrozenSet. So - if you'r proposing to work on FrozenSet, referencing the extent to which code reuse with Set is possible is important. This was less important when FrozenSet was only part of the proposal, but now that it's a more significant part, it matters a lot more.

We're running out of time for the final proposals to be in place, so if you can address that issue, you're probably ready to submit to the GSoC website.

mdmjg commented 6 years ago

@freakboy3742 Out of the methods I'm planning to implement the only one I would reuse from Set would be copy (because it is the only one that has been implemented already in Set). Realizing this made me consider that perhaps I could also implement those methods on set (by simply reusing the code I will write on Frozen Set). What do you think?

freakboy3742 commented 6 years ago

Unfortunately, this project was not selected for the 2018 GSoC. Thanks for applying!