IslandzVW / halcyon

InWorldz Halcyon 3d virtual reality world simulator
BSD 3-Clause "New" or "Revised" License
21 stars 26 forks source link

Initial Grid Setup: Going Public #298

Closed sonjamichelle closed 7 years ago

sonjamichelle commented 8 years ago

After hashing out most of the major issues I've got a pretty stable 20 region grid up and running. Though it's kinda lonely on there with just the two of us. We'd like to open the grid to others, right now the only way to create users is to manually do via the user console. I can't be at the console all the time. How do I setup a way to automate the user creation process for someone visiting the grid website?

Another thing I've tried to Implement is buying land, for right now at $0, using landtool.php. Though I failed miserably. I think it was because I tried using the landtool.php from the OpenSim wiki. Eventually I would like to implement an economy as well. Basic at first. Baby steps.

Main priority is user creation. I want to be able to invite others to the grid without having to manually create the accounts.

appurist commented 7 years ago

Ah, I didn't recognize that you kept the IP address the same too.

Did you copy over the MySQL configuration ini file as well?

Are you starting all the regions at once? (It could be that there aren't enough connections available by default in the new MySQL installation, and your former ini file had a higher default setting. Same version of MySQL on both machines?)

Although I would tend to assume first some Halcyon config option isn't the same, or next that it's a difference with having a new operating system installation (at all), or next a new difference with the new version OS (there likely are tighter protections).

And please confirm, everything is running on one machine? Is the SQL server on that same host as all of the regions? There are lots of possibilities here (port, firewalls, etc) but not if they are all using the same connection string from the same machine.

sonjamichelle commented 7 years ago

No I didn't copy over the mysql.ini file, though no changes were made to the default config so it shouldn't matter. No I'm not starting the regions at once. I wait at least 30 to 45 seconds between region starts. I have my sql connection limit set to 1000 from the default 151 at the moment. That was causing startup issues on the other server setup. Hence that change here. Everything is running on the same machine, WHIP, Aperture, Grid Server, User Server, Messaging Server, MySQL Server.

As far as OS changes, I'm not sure why it would affect some regions and not all regions. About 25% of my regions are affected.

So far nothing has come to mind. I've tried backup copies, compiling from the latest master, creating a new ini and region xml and nothing solves the issue.

appurist commented 7 years ago

Okay, that answers most of my questions. The only one unanswered is MySQL versions. The MySQL defaults only apply if you have the same version installed (which if downloaded is unlikely). So my first suggestion is to grab both MySQL.ini files and diff them for a subtle difference. After that, since it seems very selective, if you restart all the servers, is it always the same regions that fail? If so, do the automated diff of Halcyon.ini and region.XML files to verify the differences between a working and non-working region. The only things you should see different in the ini file is the one port number. And finally, that leads us to the port numbers. Is there any pattern to the port numbers of the working vs non-working regions? Is it, say 9501-9500 (or 9020-9029) working and those above 9510 (or 9030) failing? Or vice versa? If you swap port numbers of one region with another (both in Halcyon.ini and region.XML), does the working vs failing region status reverse?

appurist commented 7 years ago

Also, when you say the HTTP ACL script didn't make a difference, do you mean that you checked before applying it and those ACLs already existed? Or that you just ran it and it still failed? (A restart is needed before they will kick in.)

Also, my script there assumes you are using ports in the 800x range for services, 902x and 950x for regions. Sixteen ports in each range, from 9020-9035, and 9500-9515. (If you are using ports outside that range, of course you would need to modify the .CMD file.)

sonjamichelle commented 7 years ago

Yes it's always the same regions that fail.

I ran the cmd script after changing the port ranges to match what I'm using for my regions then restarted the machine.

Same version of MySQL.

Tried using the ini file from a working region, both with port number changes and without port number changes. No go on original port number and no go on new port number. The only thing that allows the region to start is changing the UUID. But doing that I lose everything on the region and it comes up as a new region.

appurist commented 7 years ago

ok in that case it sounds like the actual prim data. The command it runs is:

SELECT primitems.* FROM primitems 
    INNER JOIN prims 
    ON primitems.primID = prims.UUID AND prims.RegionUUID = nnnn

where nnnn is the region UUID. That's normally followed by a LIMIT that reads them 100 per call. But you might want to ignore the limit and just run a SELECT COUNT(primitems.*) and see what the number of records is to see if it's possible something went nuts and left a few million matches, causing a timeout. (It's happened to us.)

Vinhold commented 7 years ago

I recall something like this happening on my grid, but the problem was with regions on one of my servers not being accessed and that turned out to be a DNS problem with a mis-typed name. The problem is not with the region UUID or ports. So no changes need to be done there. It is with the Windows OS and perhaps firewall settings. I had also had problems with firewall settings and determining which entries needed to be put in and how to be set up and what settings did nothing! Document each detail for each of your regions and which ones you can get running and which ones won't. That is IP setting, Port, for each of the services, DB and Asset services, and each region. Also the Firewall settings for each one. Done in a way to make it easy to compare each one of the regions. I suspect as Jim, you will discover it is a blocked range of ports in Windows that is creating the problem with some of the regions. When you find what the problem was, add that to the documentation as the solution to that problem. That information is a huge help later if the problem shows up again, you will have a solution to check.

sonjamichelle commented 7 years ago

I get:

mysql> SELECT COUNT(*) FROM primitems;
+----------+
| COUNT(*) |
+----------+
|    36632 |
+----------+
1 row in set (0.92 sec)
appurist commented 7 years ago

Okay that looks reasonable, and that's for all the primitems on all regions. What is the COUNT for prims too?

The time seems a bit long, so it might be worth running the full select with the JOIN to see if that passes cleanly and quickly. If so, it's like some problem with the connection from Halcyon, rather than the data. Yet it seems like all that matters here is the data, since it's specific regions. Try this:

SELECT COUNT(*) FROM inworldz_rdb.primitems
    INNER JOIN inworldz_rdb.prims 
    ON primitems.primID = prims.UUID;

It will be interesting to see the time on that one.

sonjamichelle commented 7 years ago
`mysql> SELECT COUNT(*) FROM inworldz.primitems
    ->     INNER JOIN inworldz.prims
    ->     ON primitems.primID = prims.UUID;
+----------+
| COUNT(*) |
+----------+
|    36618 |
+----------+
1 row in set (5.39 sec)`
appurist commented 7 years ago

Again a reasonable number. Not so reasonable response time though.

Hmm, I think the problem is that the MySQL server here is taking too long to response to that query. 5.39 seconds is a long time, and I suspect there's a 5-second timeout on any db request here.

I just tried two of these queries on InWorldz main grid, one returned a count of 8150 and took 0.12 seconds, one returned a count of 14461 items and took 0.719 seconds. A response time of over 5 seconds might just be beyond the allowed threshold for these.

I think if there's a difference between the two machines, it may be the performance of the MySQL server (possibly on its own, or possibly indirectly the result of the different OS). You might want to really do that diff of the MySQL.ini files, perhaps there's a caching parameter or other performance tuning change that was made long ago that has been forgotten, or perhaps the MySQL versions aren't a match on the fresh install and the defaults are different. Or perhaps the new disk isn't as fast as the old one, or there's less memory for caching, etc. If you still have the db on the old machine, try the same queries there to see if the response times are comparable, or faster.

sonjamichelle commented 7 years ago

OK. Here's what I did based on your feedback. The new MySQL server is MySQL-Community 5.7.16 for Windows x64. The old MySQL server was MySQL-Community 5.7.16 for Linux Debian x64. So I went over to another server running 5.7.16 Linux Debian, imported the the database over to that machine, on the Windows server I changed the connection strings to reflect the new SQL DB IP, fired up a failing region and lo and behold it started just fine.

Running your command here's what I got:

`mysql> SELECT COUNT(*) FROM inworldz.primitems
    ->     INNER JOIN inworldz.prims
    ->     ON primitems.primID = prims.UUID;
+----------+
| COUNT(*) |
+----------+
|    36618 |
+----------+
1 row in set (0.09 sec)`

A major difference. Not sure what is going on with the Windows Based MySQL server. But something is jacked.

appurist commented 7 years ago

Whoa. Yeah, that's quite a difference. 😀

The interesting thing here is that the original message told us it was a MySQL timeout, but we didn't really listen to it... well now we know what that one really means if it happens again.

In the meantime though, what to do about your server. Obviously you could continue to run it on Linux (if you have the machine and disk space there), but if you wanted to try to get the Windows version comparable, check out this article. In particular, if the database is InnoDB (perhaps the old one wasn't and the new one is creating InnoDB tables). Check your mysql.ini file to see if innodb_flush_log_at_trx_commit is set to 1 (which flushes on every commit). See the comment at the bottom of that article. It looks here like you aren't the first person to run into this.

sonjamichelle commented 7 years ago

I tried a few of the methods outlined in several of the articles. The best I could get was 3 seconds. So I decided to keep it on the Linux Server. Plenty of space and resources on that box.

One thing I ran into though, after doing that, starting my new regions they complained that the table RdbHosts did not exist. It did however, though it was named rdbhosts. I had to rename it with caps to get the region to start. Not sure if this was a change with the new master branch. Just something to note.

EDIT

Turns out I had to add _lower_case_tablenames = 1 to my.cnf in order to get around the case sensitivity issues.

EDIT TO THE EDIT

Doing that broke the website for the grid. I had to find a back up DB that still had the case preserved. I have no idea what was going on with that version of MySQL for Windows, but it jacked things up big time!

sonjamichelle commented 7 years ago

What do I need to do in order to get offline messaging to work? Under halcyon.ini I have the following:

`[Messaging]
    ; Control which region module is used for instant messaging.
    ; Default is InstantMessageModule (this is the name of the core IM module
    ;  as well as the setting).
    InstantMessageModule = InstantMessageModule
    MessageTransferModule = MessageTransferModule
    OfflineMessageModule = OfflineMessageModule
    ; OfflineMessageURL = http://yourserver/Offline.php
    ; MuteListModule = MuteListModule
    ; MuteListURL = http://yourserver/Mute.php
`
appurist commented 7 years ago

Sorry, I missed the last question there. To use OfflineMessageModule, you'll need to specify the URL of a web service you provide (i.e. PHP script) that supports two commands, SaveMessage and RetrieveMessages. Both of these take a user UUID as the next element in the URL.

This is only partly documented, but there is a sample .PHP implementation here. See step 2: offline.php. That implementation includes storage in a MySQL database, but it could be anything (even text files). It really depends on the needs of your grid.

sonjamichelle commented 7 years ago

Where is the regionhandle stored for the regions?

appurist commented 7 years ago

The region handle is stored in the regions table with the other info, but just represents an encoding of the position. For example, in LLClientView.cs, this code converts a region handle back to X,Y map coordinates:

uint regionX;
uint regionY;
Utils.LongToUInts(Scene.RegionInfo.RegionHandle, out regionX, out regionY);
sonjamichelle commented 7 years ago

Is that the only place it is stored? I have a region that keeps changing from what I have my apparent default region set to. With each restart new users can no longer log in until I manually change that regionhandle back to where the userserver is trying to put the user at.

appurist commented 7 years ago

You can't change the region handle, it's auto-calculated from the map location. To change it, shut the region down, change the map location in the bin/regions/?.xml file (whatever you called it), then restart the region.

sonjamichelle commented 7 years ago

Ok, maybe I'm approaching the problem fromo the wrong angle. When the region restarts it gets a new regionhandle. The userserver keeps wanting to use a different regionhadle for the defaultregion. So I guess what I should be asking, is there some where the defaultregion is stored? Where this old value keeps creeping up. So I can enter in the new value?

Sorry, my brain is a bit muddled right know, recovering from a major snafu.

sonjamichelle commented 7 years ago

I found a command in the userserver "default regions [<filename>]". I'm assuming I can correct this problem by using this command to specify a default region by feeding the values from the named file. However, what is the format of this file? I can't seem to find anything anywhere.

sonjamichelle commented 7 years ago

I found where it was coming from. UserServer_Config.xml.

default_X="1000" default_Y="1000"

Once I found this I just needed to set it to my default regions co-ords and things fell into place. Mostly.

appurist commented 7 years ago

Ok, maybe I'm approaching the problem fromo the wrong angle. When the region restarts it gets a new regionhandle. The userserver keeps wanting to use a different regionhadle for the defaultregion.

What is the actual symptom here? There's no such thing as the wrong region handle. Why are we discussing region handles at all here?

However I think when you said default region, I interpreted that as your login/welcome region, but you're referring to something else? Where are you seeing this, that it looks like the wrong value?

sonjamichelle commented 7 years ago

The region that was being setup as the new user's starting point wasn't at 1000,1000. It was off elsewhere. SO a new user would get no available regions to log in, "is the grid down?" error from the viewer. The way I was getting around this was manually entering in the region handle that the userserver was looking for in the DB. This allowed users to log in while I tracked down the problem. I KNEW there was a file or db setting that was telling the server what the default region was supposed to be and it was not what I wanted it to be. I just couldn't find it. I guess I just didn't express what I was after clearly. As I mentioned in the above post. I found the pesky setting and made the corrections. Things are working as they should now. New users are now routed to the proper region. I just had to change the 1000,1000 to my grid co-ords.

appurist commented 7 years ago

ok looks like we posted at the same time. The User server handles routing the user to various places according to viewer login options chosen. A specific region location obviously doesn't need a default region, nor does Home or Last login locations. But if for some reason those locations aren't available, such as if it is the user's first login, or if the specified region is down (or invalid as in the case above), then User need to know where to put them. That XML file does define that, but the console command(s) provide easier and more powerful options.

It is very traditional for all grids to default to 1000,1000. Even SL's 1000,1000 region is DaBoom, which is the original region (location of the big bang). If I were you, I would leave the default location in the XML as 1000,1000. Actually, ideally, you start your grid there like other grids (e.g. position your welcome/home region at 1000,1000. Strictly speaking, you don't need to, but if the grid is still small, and not used by many, you might want to consider moving everything relative to that. Mapping code will default to being centered on that. There are probably other cases too. If you already have a bunch of regions in use, that might not really be an option though. It's not that big of a deal either way, probably a bit of pain either way, but there should be workarounds for whatever you encounter. (If not we'll add them!)

The console commands (there are actually two, default logins and default regions) and their corresponding text files, make the XML file irrelevant, and refer to a list of locations to place incoming users, that may all be in the same region, or different regions. The storage for those settings is simple text files in the bin folder of the User server. The format is just a series of lines, each with regionName/x/y/z. So something like:

Test3/124/124/25
Test2/124/124/25
Test3/124/128/25
Test2/124/128/25
Test3/128/124/25
Test2/128/124/25
Test3/128/128/25
Test2/128/128/25

One of the benefits of this is that if your welcome region is down, you can specify backups. Another benefit is that you can specify multiple specific locations in the region, so that new arrivals don't pile up on top of each other if they don't move. (Eventually they will, but if you include something like 4 locations, they won't start piling up unless the first one still hasn't moved yet when the fifth one arrives.)

There are two such files:

You can force a reload of the file without restarting User by issuing the command to read a file, e.g.:

Yes, the names are kind of confusing and I can barely keep them straight but this tends to be something you set once and forget. Help is available via help default regions and help default logins although if I remember correctly there isn't help for help default.

sonjamichelle commented 7 years ago

I guess this is a heads up, I downloaded the latest master, compiled and ran it. All parcels were set to "Avatars on other parcels can NOT see and chat with avatars on this parcel." and the setting could not be changed. The other issue was all other avatars on the grid showed up as clouds to other avatars. I redownloded the latest release available and applied it to my regions and the issue was resolved.

appurist commented 7 years ago

Parcel privacy isn't implemented yet; it's just that the latest master includes an updated OMV library that supports the packet formats needed for us to implement it. So all of that is working as designed at this point.

It's good to hear that it seems to be working other than that, because the OMV updates include some changes so inventory types and other base definitions that I've tried to keep compatible but it has not undergone testing yet. (Latest master is always a bit of an unknown, until we push it to a release.) I haven't had a confirmation that yet that the compatibility code I added for the MySQL side of inventory was good, but I think your test confirms it, as it's unlikely you are running Apache Cassandra for inventory (which is the alternative to MySQL). So that is good news too.

I guess the default regions stuff is sorted out too; you can of course do that either way, with the XML file defining just the region coordinates, or the TXT files defining one or more region name and x/y/z combinations. Anything that works for you is good! I appreciate this feedback of where you're running into issues or questions as it's a pretty good diary of areas we could use improved (or any) documentation on. You're on the bleeding edge here so it's great to get this kind of feedback.

I don't know when we'll have parcel privacy, group bans, Project Bento support, etc. It's just that this OMV update removes the roadblocks that prevented us from working on that. It's a prerequisite step that has now been completed, and one that in some of those cases is a bigger step than actually implementing the feature. From my previous investigation, parcel privacy is the easiest of the missing features to provide, so it may come first since it has been in demand for quite some time now. You can follow this in the older Mantis report 2200.

sonjamichelle commented 7 years ago

I'm going to take that as my questions and pestering has been doing some good. ;-) I'm sure it's nice to have someone on the "bleeding edge" willing to test things out and run into issues before things get rolled out to production levels. One thing I know how to do is break stuff and push things to their limit. ;-D

The default region has been worked out, thank you @appurist for the help and information. And thank you @Vinhold for the notes you dropped to me in email.

Right now I have 48 regions up and running relatively smoothly. 36 mainland regions, 2 welcome/tutorial regions and 10 outlying regions. I think major development has stabilized for the time being. thanks to @Vinhold, I have a decent website interface up and running and when Anaxmander2 comes out I should have a nice map interface to go with it. Now it's down to some website tweaks and developing the land from inworld. Lot's of time in 3DS Max for me and Photoshop for my wife to come in the coming months.

I'll definitely keep up with server code changes and implement things and keep on the bleeding edge, do my part in the OpenSource way of things ;-)

Y'all have been a great help so far and I thank you!

appurist commented 7 years ago

Yes, I think this thread is a wonderful example of the win-win that happens when someone is willing to blaze that trail on the bleeding edge. ;) We've got a lot of areas we can improve on and are now aware of from this thread. And you're up and running!

Suggestion though: If you consider the initial setup to be complete now, we might want to close this thread and start a new Issue if there's a specific new problem going forward. It's easier for others to find issues if the subject is more specific and the thread covers only one topic or aspect. This thread has been good to have for all the many things related to setup, but it may be time for separate threads now with any new issues. I'll leave that up to you though. It may depend on what the next issue is.

Edit: I've edited the subject to help anyone trying to find info on Halcyon setup.

sonjamichelle commented 7 years ago

Yes, I'd say we could close this issue as the main bumps and grinds of initial setup are out of the way. I do wish there was a more informal general chat/communication section though for general items that really aren't issue worthy though. Anyhow, I'll go ahead and close the thread. Again, thanks for the help!